- Newest
- Most votes
- Most comments
The AWS CLI commands 'aws s3 cp' and 'aws s3 sync' are indeed designed to be synchronous operations. They should complete only after all files or objects have been copied or synchronized. However, the behavior you're experiencing suggests a potential misunderstanding or an unexpected interaction between your commands.
The 'aws s3 sync' command is designed to synchronize directories and S3 prefixes. It recursively copies new and updated files from the source to the destination. The command will not finish until all specified files have been transferred.
In your specific case, where you're seeing inconsistent backups that include some files from the new build, there could be a few explanations:
-
Timing: If there's no delay between your two sync commands, it's possible that changes are still being made to the source bucket while the backup is in progress.
-
Consistency: While Amazon S3 provides strong read-after-write consistency for all operations, there might be a very short delay in propagating changes across the S3 infrastructure.
-
Large datasets: If you're dealing with a large number of files or very large files, the first sync operation might still be in progress when the second one starts, potentially leading to overlap.
To address this issue, you could try adding a delay between the two sync operations or use a more robust backup strategy. For example, you could use versioning on your S3 buckets to maintain consistent point-in-time backups.
It's also worth noting that the AWS CLI performs checksum calculations for commands that perform uploads, including 'aws s3 cp' and 'aws s3 sync'. This ensures data integrity during the transfer process.
If you continue to experience unexpected behavior, it would be advisable to contact AWS support for a more in-depth investigation of your specific use case.
Sources
Amazon S3 | Strong Consistency | Amazon Web Services
AWS CLI S3 FAQ — AWS CLI 2.25.9 Command Reference
sync — AWS CLI 2.25.9 Command Reference
Can you clarify what you mean by inconsistent. What are you expecting and what exactly you seeing in the bucket. Based on your comment, you have data from second step in example-backet-backup
. But that's normal since the next time you run your pipeline, the first command will sync these files to the bucket.
Before deploy, I have set of files in my bucket, which I need to backup - filenameX.v1, filenameX.v1, ... Deploy replaces these files with filenameY.v2, filenameY.v2, ...
When I run 'aws s3 sync' before deploy step, I expect to have copy of v1 state. In fact I have mix of v1 and v2.
Relevant content
- asked 5 years ago
- asked 5 years ago
- AWS OFFICIALUpdated a month ago
Good to know. Unfortunately aside this answer I couldn't find any evidence for that.