I use the AWS Command Line Interface (AWS CLI) sync command to transfer data on Amazon Simple Storage Service (Amazon S3). However, the transfer takes a long time to complete.
Resolution
The sync command compares the source and destination buckets to determine which source files don't exist in the destination bucket. The sync command also determines which source files were modified, compared to the files in the destination bucket. Then, it copies the new or updated source files to the destination bucket.
The number of objects in the source and destination bucket might affect the time it takes for the sync command to complete the process. Transfer size can affect the duration of the sync or the cost that you incur from requests to Amazon S3.
Delete markers also affect list performance, so it's a best practice to minimize the number of delete markers. Because the sync command runs list API calls in the backend, delete markers also affect the performance of the sync command.
To improve transfer time when you run the sync command, implement the following practices.
Run multiple instances of the AWS CLI
To copy a large amount of data, run multiple instances of the AWS CLI to perform separate sync operations in parallel. For example, you can run parallel sync operations for different prefixes:
aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/folder1 s3://destination-AWSDOC-EXAMPLE-BUCKET/folder1
aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/folder2 s3://destination-AWSDOC-EXAMPLE-BUCKET/folder2
Note: If you receive errors when running AWS CLI commands, make sure that you're using the most recent AWS CLI version.
Or, run parallel sync operations for separate exclude and include filters. For example, the following operations separate the files to sync by key names that begin with numbers 0 through 4, and numbers 5 through 9:
Note: Even when you use exclude and include filters, the sync command still reviews all files in the source bucket. This review helps to identify which source files are to be copied over to the destination bucket. If you have multiple sync operations that target different key name prefixes, then each sync operation reviews all the source files. However, because of the exclude and include filters, only the files that you include in the filters copy to the destination bucket.
aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/ s3://destination-AWSDOC-EXAMPLE-BUCKET/ --exclude "*" --include "0*" --include "1*" --include "2*" --include "3*" --include "4*"
aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/ s3://destination-AWSDOC-EXAMPLE-BUCKET/ --exclude "*" --include "5*" --include "6*" --include "7*" --include "8*" --include "9*"
For more information on optimizing the performance of your workload, see Best practices design patterns: optimizing Amazon S3 performance.
Modify the AWS CLI configuration value for max_concurrent_requests
To potentially improve performance, modify the value of max_concurrent_requests. This value sets the number of requests that you can send to Amazon S3 at a time. The default value is 10, but you can increase it to a higher value. However, note the following limitations:
- Running more threads consumes more resources on your machine. You must be sure that your machine has enough resources to support the maximum number of concurrent requests that you want.
- Too many concurrent requests might overwhelm a system. This might cause connection timeouts or slow the responsiveness of the system. To avoid timeout issues from the AWS CLI, set the --cli-read-timeout value or the --cli-connect-timeout value to 0.
(Optional) Check the instance configuration
If you use an Amazon Elastic Compute Cloud (Amazon EC2) instance to run the sync operation, then use the following best practices:
- Review your instance type. Larger instance types can provide better results, because they have high bandwidth and Amazon Elastic Block Store (Amazon EBS)-optimized networks.
- If the instance is in a different AWS Region than the bucket, then use an instance in the same Region. To reduce latency, reduce the geographical distance between the instance and your Amazon S3 bucket.
- If the instance is in the same Region as the source bucket, then set up an Amazon Virtual Private Cloud (Amazon VPC) endpoint for S3. VPC endpoints can help improve overall performance.
Related information
How can I improve the transfer speeds for copying data between my S3 bucket and EC2 instance?
What's the best way to transfer large amounts of data from one Amazon S3 bucket to another?