s3 sync command is SLOW to start (On some data)

0

I daily run 2 CLI commands from my local PC. One set of data has roughly 40 new files on s3 and the other has 1200+ files.

The smaller file set jumps into action, while i can stare at a black screen for quite a while until it starts downloading the first file. I try to keep the s3 buckets clean (removing data so that there isn't so much to sync).

Is there something that I can do to make the sync start quicker? Ultimately, I just want the downloads to complete faster. I did think about COMPUTING the CLI command to make it static and download the current data, but because I'm not fully automated, I like the flexibility to download multiple days worth of data.

asked 8 months ago840 views
2 Answers
0

Hi MarqueeCReW,

Are you near from your S3 data? If no, consider enabling S3 Transfer Acceleration for faster uploads and downloads with S3. This can help improve the speed of transferring files over long distances.

The AWS CLI can perform operations in parallel to increase the speed of transferring files. You can adjust the max_concurrent_requests option to a higher value. Modify this in your AWS CLI configuration (~/.aws/config):

[default]
s3 =
    max_concurrent_requests = 20

You can adjust the number according to your bandwidth and system capabilities.

For large files, you can increase the multipart_chunksize to a larger value. Again, this can be adjusted in your AWS CLI configuration:

[default]
s3 =
    multipart_chunksize = 128MB

If you know specifically which files or types of files you want, you can use the --exclude and --include parameters to narrow down what's being synced. This minimizes the amount of data the sync command has to process.

For example, if you know you only want .txt files:

aws s3 sync s3://your-bucket/path/ local-path/ --exclude "*" --include "*.txt"

I hope this helps! If this solution works for you, please accept the answer. Otherwise, do leave a comment, and I'll try to assist you.

profile picture
answered 8 months ago
  • I'm the same distance for both sets of data. The 40+ files have already completed. The other files haven't even started yet. I tried again from powershell and it did the same thing as my batch file command. It doesn't make sense to me why the calculation or startup is not happening.

  • now ... all of the sudden it started downloading. just takes an hour or more to start.

0

I understand you might have considered some basic checks, but it's worth reconfirming:

  • Network Delays: There could be network-related delays originating from your end, your ISP, or any other intermediate nodes. Trying a different network or evaluating the current network's stability might provide insights.
  • Local System Resources: Other active processes on your machine might be influencing the AWS CLI's performance. It's helpful to monitor the system's performance metrics, such as CPU and memory usage, during the s3 sync operation.
  • Using --debug with AWS CLI: To get an in-depth view of what the CLI is doing, you can append the --debug option during your sync command. It'll yield extensive output, which can pinpoint where the potential slowdown might be.
  • Alternative Tools: While AWS CLI is a preferred choice for many, various third-party tools offer syncing capabilities with S3. Exploring an alternative might help you determine if the delay is specifically related to the AWS CLI.
profile picture
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions