- Newest
- Most votes
- Most comments
Populate list of files to an SQS queue.
Write a client that uses the AWS SDK to read from the queue and upload the file in the message to S3.
Run the client across multiple servers to maximize the number of files being uploaded. Since the client is getting the file names from the queue message, you should not have conflicts/duplicates.
Hope this helps.
Have you considered using AWS Snowball as a one off?
You can considering using AWS Data Sync to transfer from NFS, SMB, HDFS or object storage to Amazon S3. This works by deploying an agent on-premises and configuring a "Task" which includes the folders to copy to an S3 bucket. Each agent can execute a single task. DataSync takes care of discovering the files, parallelization, file checks during transfer and more.
You can also configure the agent to use multiple NICs to maximize throughput, or create multiple tasks by filtering down on folders and assigning each task to a different agent deployed on-premises.
Relevant content
- asked 6 months ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
This is a great idea. Be wary, the maxium SQS Visiblity timeout is 12 hours, so if a file takes longer than 12 hours the message will reappear in the queue and be processed again.