aws s3 sync --- downloading older version from source [BUG?]

0

AWS CLI Version: aws-cli/1.18.147 Python/2.7.18 Linux/4.14.322-246.539.amzn2.x86_64 botocore/1.18.6

Hello,

We’ve been using AWS CLI for several years, but today I noticed an odd behavior. According to the AWS S3 SYNC documentation:

Syncs directories and S3 prefixes. Recursively copies new and updated files from the source directory to the destination. Only creates folders in the destination if they contain one or more files.

This description matches our understanding of how it should work.

However, the issue we encountered today is that the sync operation is pulling down older files from the source and overwriting newer versions in the destination.

The command we ran to download the Batch123 folder from our S3 bucket to our local Batch123 folder is:

  a=Batch123; aws s3 sync s3://upload.xxxx.dlconsulting.com/JeffreyTesting/$a $a

The odd behavior we noticed is that when the file size is unchanged, the sync process (without using the --size-only or --exact-timestamps flags) re-downloads the file if the source is older than the destination version. On the other hand, if the source version is newer, it doesn’t re-download the file. Shouldn’t this behavior be the opposite? The sync process should re-download the file if the source version is newer than the destination version.

Could you confirm whether this is a bug or if we are misunderstanding how the sync works?


Destination newer than Source: Destination File Redownloaded

[Destination Timestamp: 01:28 - newer]

jeffrey@server:[01:28:31]:/home/jeffrey/temp $ ls -al Batch123/bag-info.txt
-rw-rw-r--. 1 jeffrey dlc 77 Oct  8 01:28 Batch123/bag-info.txt

[Source Timestamp: 01:20 - older]

jeffrey@server:[01:28:35]:/home/jeffrey/temp $ aws s3api head-object --bucket upload.xxxx.dlconsulting.com --key JeffreyTesting/Batch123/bag-info.txt
{
    "AcceptRanges": "bytes", 
    "ContentType": "text/plain", 
    "LastModified": "Tue, 08 Oct 2024 01:20:54 GMT", 
    "ContentLength": 77, 
    "ETag": "\"a99928a00647d5f784764b64b7cfb069\"", 
    "ServerSideEncryption": "AES256", 
    "Metadata": {}
}

[Destination File Redownloaded]

jeffrey@server:[01:29:15]:/home/jeffrey/temp $ a=Batch123; aws s3 sync s3://upload.xxxx.dlconsulting.com/JeffreyTesting/$a $a
download: s3://upload.xxxx.dlconsulting.com/JeffreyTesting/Batch123/bag-info.txt to Batch123/bag-info.txt

Source newer than Destination: Destination File is not Redownloaded

[Destination Timestamp: forced it to be 10/06 - older]

jeffrey@server:[01:58:12]:/home/jeffrey/temp $ touch -d "2024-10-02 12:30:00" Batch123/bag-info.txt 
jeffrey@server:[01:58:28]:/home/jeffrey/temp $ ls -al Batch123/bag-info.txt
-rw-rw-r--. 1 jeffrey dlc 77 Oct  2 12:30 Batch123/bag-info.txt

[Source Timestamp: 10/08 - newer]

jeffrey@server:[01:58:33]:/home/jeffrey/temp $ aws s3api head-object --bucket upload.xxxx.dlconsulting.com --key JeffreyTesting/Batch123/bag-info.txt
{
    "AcceptRanges": "bytes", 
    "ContentType": "text/plain", 
    "LastModified": "Tue, 08 Oct 2024 01:20:54 GMT", 
    "ContentLength": 77, 
    "ETag": "\"a99928a00647d5f784764b64b7cfb069\"", 
    "ServerSideEncryption": "AES256", 
    "Metadata": {}
}

[Destination File was NOT Redownloaded]

jeffrey@server:[01:58:41]:/home/jeffrey/temp $ a=Batch123; aws s3 sync s3://upload.xxxx.dlconsulting.com/JeffreyTesting/$a $a
[Nothing downloaded]
asked 7 months ago192 views
2 Answers
0

I will advice you to contact AWS Support team because the above answer is an AI generated answer.

Thanks

profile picture
EXPERT
answered 5 months ago
-1

The behavior you're observing with the AWS CLI S3 sync command is not a bug, but it is indeed counterintuitive. The sync command is designed to copy new and updated files from the source to the destination, but it has a specific way of determining what "updated" means.

By default, the S3 sync command uses both the file size and the modification time to determine if a file needs to be updated. However, the comparison of modification times works in a way that might seem unexpected:

  1. If the destination file is newer than the source file, the sync command will download the file from S3. This is because the command assumes that the S3 version is the "correct" or "authoritative" version, and that any local file with a more recent timestamp might be out of sync with S3.

  2. If the source file in S3 is newer than the destination file, the sync command will not download the file. This is because the command assumes that the newer S3 version is already reflected in the local copy.

This behavior is designed to ensure that the local copy always matches the S3 version, even if local files have been modified. However, it can lead to unexpected results, as you've observed.

To modify this behavior, you can use additional flags:

  1. Use --exact-timestamps to only update files if the timestamps are different.
  2. Use --size-only to compare only the size of the files, ignoring the timestamps.

If you want to ensure that you always get the latest version from S3, regardless of local timestamps, you might consider using the --force-update flag, which will download all files from S3, overwriting the local copies.

Remember that the default behavior of S3 sync is designed to minimize unnecessary data transfer, but it may not always align with specific use cases. Always test the sync command with your particular setup to ensure it behaves as expected for your needs.
Sources
Amazon S3 examples using AWS CLI - AWS SDK Code Examples

profile picture
answered 7 months ago
  • We probably need a real AWS engineer to verify this AI-generated comment. Unfortunately, I'm not convinced by its answer.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions