How to upload 38TB to S3 efficiently?

0

Hi. We would like to upload 38TB in total ( a few thousand files, the biggest files are 45GB). We can use any tool possible, s3cp recursive or equivalent. How would you go about to stature the bandwidth as much as possible? The combined outbound bw is around 100-1000GB/s but we will probably have to run this concurrently from multiple machines to achieve this throughput....

3 Answers
1

Populate list of files to an SQS queue.

Write a client that uses the AWS SDK to read from the queue and upload the file in the message to S3.

Run the client across multiple servers to maximize the number of files being uploaded. Since the client is getting the file names from the queue message, you should not have conflicts/duplicates.

Hope this helps.

profile pictureAWS
EXPERT
answered 24 days ago
profile picture
EXPERT
reviewed 24 days ago
  • This is a great idea. Be wary, the maxium SQS Visiblity timeout is 12 hours, so if a file takes longer than 12 hours the message will reappear in the queue and be processed again.

1

Have you considered using AWS Snowball as a one off?

https://aws.amazon.com/snowball/

profile picture
EXPERT
answered 24 days ago
0

You can considering using AWS Data Sync to transfer from NFS, SMB, HDFS or object storage to Amazon S3. This works by deploying an agent on-premises and configuring a "Task" which includes the folders to copy to an S3 bucket. Each agent can execute a single task. DataSync takes care of discovering the files, parallelization, file checks during transfer and more.

You can also configure the agent to use multiple NICs to maximize throughput, or create multiple tasks by filtering down on folders and assigning each task to a different agent deployed on-premises.

AWS
answered 16 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions