- Newest
- Most votes
- Most comments
To utilize multiple CPUs for syncing S3 buckets, you can improve the performance of the aws s3 sync
command by running parallel sync operations with separate exclude and include filters. While this approach doesn't directly use all 80 CPUs, it can significantly enhance transfer performance.
Here's a strategy you can employ:
- Divide your files into groups based on their prefixes or patterns.
- Run multiple sync commands simultaneously, each targeting a specific group of files.
For example, you could split your sync operations based on file name prefixes:
aws s3 sync s3://source-bucket/ s3://destination-bucket/ --exclude "*" --include "0*" --include "1*" &
aws s3 sync s3://source-bucket/ s3://destination-bucket/ --exclude "*" --include "2*" --include "3*" &
aws s3 sync s3://source-bucket/ s3://destination-bucket/ --exclude "*" --include "4*" --include "5*" &
aws s3 sync s3://source-bucket/ s3://destination-bucket/ --exclude "*" --include "6*" --include "7*" &
aws s3 sync s3://source-bucket/ s3://destination-bucket/ --exclude "*" --include "8*" --include "9*" &
Each command runs in the background (denoted by the &
at the end), allowing for parallel execution. You can create more granular splits to increase parallelism further.
Keep in mind that even when using exclude and include filters, each sync command still reviews all files in the source bucket to determine which files to copy. However, only the files matching the include filters will be transferred to the destination bucket.
For optimal performance, consider factors such as network bandwidth, S3 request rates, and the nature of your data. You may need to experiment with different grouping strategies to find the most efficient approach for your specific use case.
Alternatively, for more advanced multi-account and cross-region scenarios, you might want to explore using AWS DataSync, which is designed for high-performance data transfers and can automatically optimize resource utilization.
Sources
Improve transfer performance of sync command in Amazon S3 | AWS re:Post
Tutorial: Transferring data between Amazon S3 buckets across AWS accounts - AWS DataSync
Relevant content
- asked 3 years ago
- asked 2 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 months ago
Thx a lot, if I resume, I can increase perf max_concurrent_requests, max_queue_size, multipart_threshold, multipart_chunksiz, and max_bandwidth. Do you have best practices for choosing good param for specific hardware configurations? I have 80 cpus at 2.5 GHz.