S3 bulk renaming

0

A customer has 200,000 csv files in one S3 bucket without any prefix.
The csv files varies from 0kb to 400mb. They need to rename all the files.
They are using two EC2 with S3 mv command to rename the files, it is slow and keep times out.

Is there a more efficient method? or better approach?

AWS
질문됨 6년 전1766회 조회
1개 답변
0
수락된 답변

I have logged a support ticket and here is the list of things that we can look out for:

Are the EC2 instances in the same region as the s3 bucket? Also, I believe that they might be using the default setting of concurrent requests in the CLI. By default s3 CLI uses 10 concurrent requests. Ask the customer to increase it to utilize the network bandwidth at its fullest and finish the operation as fast as possible. Read more about the S3 cli configuration here: https://docs.aws.amazon.com/cli/latest/topic/s3-config.html

Also, are they utilizing NAT gateway, VPC endpoint or going through public Internet? If they happen to be using a NAT instance, make sure they the NAT instance has enough resources to support the requests/traffic.

They can also look into using SDK to rename the file if the CLI seems to be the bottleneck (it should not though). For example, Python has copy_from function that will essentially rename the file, and they can retain most of the metadata too: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.copy_from

ACLs won't be preserved though. They will have to include ACL as one of the parameters in the function.

AWS
답변함 6년 전
profile picture
전문가
검토됨 23일 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠