I use the AWS Command Line Interface (AWS CLI) sync command to transfer data on Amazon Simple Storage Service (Amazon S3). However, the transfer takes a long time to complete.
Short description
The number of objects in the source and destination bucket might affect the time it takes for the sync command to complete the process. Transfer size can affect the duration of the sync and the cost of requests to Amazon S3.
Resolution
Note: If you receive errors when you run AWS CLI commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Remove expired delete markers
Because the sync command runs list API calls on the backend, delete markers impact the performance of the sync command. It's a best practice to minimize the number of delete markers. You can use an S3 Lifecycle configuration rule to automatically remove expired delete markers in a versioning-activated bucket.
Run multiple AWS CLI operations
To copy a large amount of data, run separate sync operations in parallel. The following example command runs parallel sync operations for different prefixes:
aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/folder1 s3://destination-AWSDOC-EXAMPLE-BUCKET/folder1 aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/folder2 s3://destination-AWSDOC-EXAMPLE-BUCKET/folder2
Or, run parallel sync operations for separate exclude and include filters. The following example operations separate the files to sync by key names that begin with numbers 0 through 4, and numbers 5 through 9:
aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/ s3://destination-AWSDOC-EXAMPLE-BUCKET/ --exclude "*" --include "0*" --include "1*" --include "2*" --include "3*" --include "4*" aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/ s3://destination-AWSDOC-EXAMPLE-BUCKET/ --exclude "*" --include "5*" --include "6*" --include "7*" --include "8*" --include "9*"
Note: Even when you use exclude and include filters, the sync command still reviews all files in the source bucket. The review identifies the source files to copy to the destination bucket. If you have multiple sync operations for different key name prefixes, then each sync operation reviews all the source files. However, because of the exclude and include filters, Amazon S3 copies only the files that you include in the filters to the destination bucket.
For more information about how to optimize the performance of your workload, see Best practices design patterns: optimizing Amazon S3 performance.
Activate S3 Transfer Acceleration
Use S3 Transfer Acceleration to improve your transfer speeds.
To review pricing for S3 Transfer Acceleration, choose the Data transfer tab on the Amazon S3 pricing page. To determine whether S3 Transfer Acceleration improves your transfer speeds, use the Amazon S3 Transfer Acceleration Speed Comparison tool.
Note: With S3 Transfer Acceleration, you can't use the CopyObject action across AWS Regions.
Modify the AWS CLI configuration values
max_concurrent_requests
When you use max_concurrent_requests, the default number of requests that you can send to Amazon S3 at the same time is 10. To improve performance, increase the value.
Important:
- When you run more threads, you use more resources on your machine. Make sure that your machine has enough resources to support your maximum number of concurrent requests.
- Too many concurrent requests might cause connection timeouts or slow the system's responsiveness. To avoid timeout issues from the AWS CLI, set the --cli-read-timeout option or the --cli-connect-timeout option to 0.
multipart_threshold
When a file reaches the size threshold, Amazon S3 uses a multipart upload instead of a single operation. The default value for multipart_threshold is 8 MB. To increase the default value, run the following command:
aws configure set default.s3.multipart_threshold 16MB
Note: Replace 16MB with your multipart threshold size.
multipart_chunksize
The default value for multipart_chunksize is 8 MB and the minimum value is 5 MB. To increase the chunk size, run the following command:
aws configure set default.s3.multipart_chunksize 16MB
Note: Replace 16MB with your new chunk size.
For large objects, it's a best practice to set the multipart_threshold to 100 MB so that only large files use multipart uploads. It's also a best practice to set the multipart_chunksize to 25 MB to balance between efficient uploads and manageable part sizes.
(Optional) Check your Amazon EC2 instance configuration
If you run sync from an Amazon Elastic Compute Cloud (Amazon EC2) instance, then use the following best practices to improve performance:
Related information
How can I improve the transfer speeds for copying data between my S3 bucket and EC2 instance?
How do I transfer large amounts of data from one Amazon S3 bucket to another?
How do I troubleshoot slow or inconsistent speeds when I download or upload data to Amazon S3 from an on-premises client?