By using AWS re:Post, you agree to the Terms of Use

S3 bulk renaming

0

A customer has 200,000 csv files in one S3 bucket without any prefix.
The csv files varies from 0kb to 400mb. They need to rename all the files.
They are using two EC2 with S3 mv command to rename the files, it is slow and keep times out.

Is there a more efficient method? or better approach?

asked 4 years ago320 views
1 Answer
0
Accepted Answer

I have logged a support ticket and here is the list of things that we can look out for:

Are the EC2 instances in the same region as the s3 bucket? Also, I believe that they might be using the default setting of concurrent requests in the CLI. By default s3 CLI uses 10 concurrent requests. Ask the customer to increase it to utilize the network bandwidth at its fullest and finish the operation as fast as possible. Read more about the S3 cli configuration here: https://docs.aws.amazon.com/cli/latest/topic/s3-config.html

Also, are they utilizing NAT gateway, VPC endpoint or going through public Internet? If they happen to be using a NAT instance, make sure they the NAT instance has enough resources to support the requests/traffic.

They can also look into using SDK to rename the file if the CLI seems to be the bottleneck (it should not though). For example, Python has copy_from function that will essentially rename the file, and they can retain most of the metadata too: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.copy_from

ACLs won't be preserved though. They will have to include ACL as one of the parameters in the function.

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions