S3 copy from one folder to another

0

I have requirements to copy small small huge number of files from one folder recursively under S3 bucket to another folder path periodically or once it arrives in source location. Source location: bucket_name_1/folder1/folder2/name123.csv bucket_name_1/folder1/folder3/name127.csv bucket_name_1/folder1/folder4/name128.csv target copy location: bucket_name_1/folder_bkp/folder2/name123.csv bucket_name_1/folder_bkp/folder3/name125.csv bucket_name_1/folder_bkp/folder4/name126.csv

can you please help to get this done using less cost, easy to setup and efficient to do this?

asked a year ago2259 views
3 Answers
2

Hi,

If your use case requires copying the file once it is created at source location, a common solution is to configure an S3 Event Notification and invoque a Lambda function to execute your logic (in this case, copy the file to another path).

However, if the number of files is huge, it is better to configure the notification from S3 to SQS, and then process the events in batch through the lambda function.

Note: As described in the official documentation, event notifications are delivered in seconds but can sometimes take a minute or longer.

profile picture
EXPERT
answered a year ago
  • Yes this can be used but as I mentioned earlier, source file will be huge and huge in number which can throttle the Lambda, it may find concurrency limit to execute like problems.

  • Totally agree! I just edited the answer but you had answered before.

  • HI Mikel, thanks for this approach, where will be that batch option available? SQS or Lambda? So that Lambda will not be triggered for every small files with instance rather, Lambda should be triggered batch like may be 100 files.

  • Hi,

     

    The batch options are configured at the lambda trigger, using Batch size, and Batch window properties. The next page shows how to do it step by step.

     

    The Batch size defines the number of records to send to the function in each batch, and the Batch window the maximum amount of time to gather records before invoking the function, so you should adjust them based on your use case.

     

    Your Lambda function will be executed when one of these two values is reached. That is, when the configured maximum batch size is reached or when the batch window expires.

1

Have you looked at S3 replication - https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication.html? It removes the need for any customised code, and objects can be replicated within 15 minutes of being created in the source bucket.

profile picture
answered a year ago
0

Please take a look in this article: https://repost.aws/knowledge-center/s3-large-transfer-between-buckets

Another option mentioned there is S3DistCp tool. It is suitable for periodic transfers. For the "large number of small files" use case, the --srcPrefixesFile option could be useful since it allows to provide a list of prefixes (folders) to copy, so the tool won't have to ls everything in the source bucket. S3DistCp tool runs on EMR service so you have to consider its cost, but it may be quite efficient (i.e. quick) compared to alternative solutions.

nikos64
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions