By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Multi-cpu for sync bucket

0

I have 80 cpus how can I run aws s3 sync source destination on all cpus.

1 Answer
1

To utilize multiple CPUs for syncing S3 buckets, you can improve the performance of the aws s3 sync command by running parallel sync operations with separate exclude and include filters. While this approach doesn't directly use all 80 CPUs, it can significantly enhance transfer performance.

Here's a strategy you can employ:

  1. Divide your files into groups based on their prefixes or patterns.
  2. Run multiple sync commands simultaneously, each targeting a specific group of files.

For example, you could split your sync operations based on file name prefixes:

aws s3 sync s3://source-bucket/ s3://destination-bucket/ --exclude "*" --include "0*" --include "1*" &
aws s3 sync s3://source-bucket/ s3://destination-bucket/ --exclude "*" --include "2*" --include "3*" &
aws s3 sync s3://source-bucket/ s3://destination-bucket/ --exclude "*" --include "4*" --include "5*" &
aws s3 sync s3://source-bucket/ s3://destination-bucket/ --exclude "*" --include "6*" --include "7*" &
aws s3 sync s3://source-bucket/ s3://destination-bucket/ --exclude "*" --include "8*" --include "9*" &

Each command runs in the background (denoted by the & at the end), allowing for parallel execution. You can create more granular splits to increase parallelism further.

Keep in mind that even when using exclude and include filters, each sync command still reviews all files in the source bucket to determine which files to copy. However, only the files matching the include filters will be transferred to the destination bucket.

For optimal performance, consider factors such as network bandwidth, S3 request rates, and the nature of your data. You may need to experiment with different grouping strategies to find the most efficient approach for your specific use case.

Alternatively, for more advanced multi-account and cross-region scenarios, you might want to explore using AWS DataSync, which is designed for high-performance data transfers and can automatically optimize resource utilization.
Sources
Improve transfer performance of sync command in Amazon S3 | AWS re:Post
Tutorial: Transferring data between Amazon S3 buckets across AWS accounts - AWS DataSync

profile picture
answered 16 days ago
profile picture
EXPERT
reviewed 16 days ago
  • Thx a lot, if I resume, I can increase perf max_concurrent_requests, max_queue_size, multipart_threshold, multipart_chunksiz, and max_bandwidth. Do you have best practices for choosing good param for specific hardware configurations? I have 80 cpus at 2.5 GHz.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions