- Newest
- Most votes
- Most comments
See this blog post. https://aws.amazon.com/blogs/storage/cross-account-bulk-transfer-of-files-using-amazon-s3-batch-operations/
Similar to the blog, my recommendation would be to use the S3 Inventory to get a list of the files in the bucket then do some scripting (on an EC2 instance close to the S3 data) to make zips of files from the inventory list to another bucket -- goal is to create much fewer but larger (perhaps 1GB) files. Once you have fewer but larger files, proceed with the download. This should help utilize your bandwidth for meaningful transfers rather than millions of connects/disconnects.
Hope this helps.
The number of files isn't astronomically large but it's certainly huge for a single-thread PHP script to process one file at a time. You can speed the copying up by a couple of orders of magnitude by using the command-line interface (AWS CLI) instead. Since your files are only an average of 8 kiB in size, you could set max_concurrent_requests
to 128 to start with for about a hundred-fold performance increase, and optionally experiment with larger values if you'd like. Also set max_queue_size
to 10,000. https://awscli.amazonaws.com/v2/documentation/api/latest/topic/s3-config.html
Then download the files like so:
aws s3 sync s3://my-bucket-name /local-destination
You can do this on an EC2 instance in the same region with the bucket, as iBehr correctly suggested, but you'll get massively improved performance from any location, without intermediate steps or additional infrastructure, compared to the single-thread copy by utilising the built-in parallelisation capability of the AWS CLI/SDK.
If you choose to copy the files on an EC2 instance first, I suggest you make sure you have a VPC gateway endpoint for S3 in the VPC before starting to copy. It'll avoid the added cost of running the replication traffic through a NAT gateway.
Relevant content
- AWS OFFICIALUpdated 10 months ago
I agree: zipping your multiple files in bigger archives will accelerate.