S3 bulk renaming

0

A customer has 200,000 csv files in one S3 bucket without any prefix.
The csv files varies from 0kb to 400mb. They need to rename all the files.
They are using two EC2 with S3 mv command to rename the files, it is slow and keep times out.

Is there a more efficient method? or better approach?

AWS
feita há 6 anos1765 visualizações
1 Resposta
0
Resposta aceita

I have logged a support ticket and here is the list of things that we can look out for:

Are the EC2 instances in the same region as the s3 bucket? Also, I believe that they might be using the default setting of concurrent requests in the CLI. By default s3 CLI uses 10 concurrent requests. Ask the customer to increase it to utilize the network bandwidth at its fullest and finish the operation as fast as possible. Read more about the S3 cli configuration here: https://docs.aws.amazon.com/cli/latest/topic/s3-config.html

Also, are they utilizing NAT gateway, VPC endpoint or going through public Internet? If they happen to be using a NAT instance, make sure they the NAT instance has enough resources to support the requests/traffic.

They can also look into using SDK to rename the file if the CLI seems to be the bottleneck (it should not though). For example, Python has copy_from function that will essentially rename the file, and they can retain most of the metadata too: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.copy_from

ACLs won't be preserved though. They will have to include ACL as one of the parameters in the function.

AWS
respondido há 6 anos
profile picture
ESPECIALISTA
avaliado há 23 dias

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas