Is AWS CLI really synchronious?

0

I was sure that 'aws s3 cp' and 'sync' commands are synchronous, i.e. command finished only after all files(objects) are copied. But recently I saw behaviour which I can't explain it any other way than if these commands are asynchronous. I have CI/CD flow which includes S3 backup step and deploy to S3 after that. I.e.

aws s3 sync s3://example-backet s3://example-backet-backup --delete
aws s3 sync build s3://example-backet --delete

I spend many hours debugging my workflow until I noticed that my backup is inconsistent, i.e. includes some files from the new build - from the 2nd command.

Is there some doc which confirm my theory?

Thanks, Vitaly

asked a month ago93 views
2 Answers
1

The AWS CLI commands 'aws s3 cp' and 'aws s3 sync' are indeed designed to be synchronous operations. They should complete only after all files or objects have been copied or synchronized. However, the behavior you're experiencing suggests a potential misunderstanding or an unexpected interaction between your commands.

The 'aws s3 sync' command is designed to synchronize directories and S3 prefixes. It recursively copies new and updated files from the source to the destination. The command will not finish until all specified files have been transferred.

In your specific case, where you're seeing inconsistent backups that include some files from the new build, there could be a few explanations:

  1. Timing: If there's no delay between your two sync commands, it's possible that changes are still being made to the source bucket while the backup is in progress.

  2. Consistency: While Amazon S3 provides strong read-after-write consistency for all operations, there might be a very short delay in propagating changes across the S3 infrastructure.

  3. Large datasets: If you're dealing with a large number of files or very large files, the first sync operation might still be in progress when the second one starts, potentially leading to overlap.

To address this issue, you could try adding a delay between the two sync operations or use a more robust backup strategy. For example, you could use versioning on your S3 buckets to maintain consistent point-in-time backups.

It's also worth noting that the AWS CLI performs checksum calculations for commands that perform uploads, including 'aws s3 cp' and 'aws s3 sync'. This ensures data integrity during the transfer process.

If you continue to experience unexpected behavior, it would be advisable to contact AWS support for a more in-depth investigation of your specific use case.
Sources
Amazon S3 | Strong Consistency | Amazon Web Services
AWS CLI S3 FAQ — AWS CLI 2.25.9 Command Reference
sync — AWS CLI 2.25.9 Command Reference

profile picture
answered a month ago
profile picture
EXPERT
reviewed a month ago
  • Good to know. Unfortunately aside this answer I couldn't find any evidence for that.

0

Can you clarify what you mean by inconsistent. What are you expecting and what exactly you seeing in the bucket. Based on your comment, you have data from second step in example-backet-backup. But that's normal since the next time you run your pipeline, the first command will sync these files to the bucket.

answered a month ago
  • Before deploy, I have set of files in my bucket, which I need to backup - filenameX.v1, filenameX.v1, ... Deploy replaces these files with filenameY.v2, filenameY.v2, ...

    When I run 'aws s3 sync' before deploy step, I expect to have copy of v1 state. In fact I have mix of v1 and v2.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions