S3 bulk renaming

0

A customer has 200,000 csv files in one S3 bucket without any prefix.
The csv files varies from 0kb to 400mb. They need to rename all the files.
They are using two EC2 with S3 mv command to rename the files, it is slow and keep times out.

Is there a more efficient method? or better approach?

AWS
gefragt vor 6 Jahren1765 Aufrufe
1 Antwort
0
Akzeptierte Antwort

I have logged a support ticket and here is the list of things that we can look out for:

Are the EC2 instances in the same region as the s3 bucket? Also, I believe that they might be using the default setting of concurrent requests in the CLI. By default s3 CLI uses 10 concurrent requests. Ask the customer to increase it to utilize the network bandwidth at its fullest and finish the operation as fast as possible. Read more about the S3 cli configuration here: https://docs.aws.amazon.com/cli/latest/topic/s3-config.html

Also, are they utilizing NAT gateway, VPC endpoint or going through public Internet? If they happen to be using a NAT instance, make sure they the NAT instance has enough resources to support the requests/traffic.

They can also look into using SDK to rename the file if the CLI seems to be the bottleneck (it should not though). For example, Python has copy_from function that will essentially rename the file, and they can retain most of the metadata too: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.copy_from

ACLs won't be preserved though. They will have to include ACL as one of the parameters in the function.

AWS
beantwortet vor 6 Jahren
profile picture
EXPERTE
überprüft vor 23 Tagen

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen