Skip to content

S3 boto3 copy_object validation

0

Customer is planning to build a tool using boto3 that will be performing copy_object operations between buckets. They are asking the following:

  1. What can go wrong that may cause silent data corruption during s3 to s3 copy?
  2. Is there an internal mechanism to detect such corruption? If yes, will it handle transparently or return an error to the client, expecting it to retry?
  3. If no internal mechanism, how can the tool perform validation that such corruption did not occur?

I would be happy to hear suggestions regarding the 1st and 2nd answers. On the 3nd answer, my understanding is it depends:

1 No encryption or with SSE-S3

The etag should remain the same if the file was stored as single upload or if during the copy operation we used the same number and size of chunks.

2 With KMS encryption

Same as above, assuming same KMS key was used. If the key is not the same, content will be different.

There is always an option to download the file and calculate the checksum, but it is the slowest and expensive option.

AWS
asked 6 years ago847 views
1 Answer
0
Accepted Answer
  1. In any data storage system there is a possibility of data degradation or bit rot. With S3, that possibility is extremely low with the 11 9s of durability that the service provides. This means that if you store 10,000,000 objects in S3, you can on average expect to potentially lose a single object once every 10,000 years. This risk can be mitigated by periodic recovery testing and adding in another backup using cross-region replication.
  2. Yes. The S3 service runs checksums and CRC checks to detect and fix any corruption. This is done automatically in the background and no error is returned to the customer. More details on this here
  3. See my comments above
AWS
answered 6 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.