Skip to content

How to upload 38TB to S3 efficiently?

0

Hi. We would like to upload 38TB in total ( a few thousand files, the biggest files are 45GB). We can use any tool possible, s3cp recursive or equivalent. How would you go about to stature the bandwidth as much as possible? The combined outbound bw is around 100-1000GB/s but we will probably have to run this concurrently from multiple machines to achieve this throughput....

3 Answers
1

Populate list of files to an SQS queue.

Write a client that uses the AWS SDK to read from the queue and upload the file in the message to S3.

Run the client across multiple servers to maximize the number of files being uploaded. Since the client is getting the file names from the queue message, you should not have conflicts/duplicates.

Hope this helps.

AWS
EXPERT
answered a year ago
EXPERT
reviewed a year ago
  • This is a great idea. Be wary, the maxium SQS Visiblity timeout is 12 hours, so if a file takes longer than 12 hours the message will reappear in the queue and be processed again.

1

Have you considered using AWS Snowball as a one off?

https://aws.amazon.com/snowball/

EXPERT
answered a year ago
0

You can considering using AWS Data Sync to transfer from NFS, SMB, HDFS or object storage to Amazon S3. This works by deploying an agent on-premises and configuring a "Task" which includes the folders to copy to an S3 bucket. Each agent can execute a single task. DataSync takes care of discovering the files, parallelization, file checks during transfer and more.

You can also configure the agent to use multiple NICs to maximize throughput, or create multiple tasks by filtering down on folders and assigning each task to a different agent deployed on-premises.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.