Copying files from AWS S3 to Glacier Flexible Retrieval in another S3

0

I'm looking to change some Python 3 code which currently copies an SSE-C encoded file from one S3 bucket to another. On the target S3 bucket, the file will be stored in Glacier Flexible Retrieval format and that's currently achieved by lifecycle transition which is set on the target bucket. However, this causes a number of problems as it is asynchronous so we can't predict when the change will take place.

Ideally, I would like to copy the file so that the target is directly written to GFR storage class. However, when I looked at the documentation for boto3 s3 client copy I saw that the allowed arguments for ExtraArgs as documented here are only:

ALLOWED_DOWNLOAD_ARGS = ['ChecksumMode', 'VersionId', 'SSECustomerAlgorithm', 'SSECustomerKey', 'SSECustomerKeyMD5', 'RequestPayer', 'ExpectedBucketOwner']

So StorageClass is not in there which implies that it's not possible. I looked at copy_object as well, but it won't support files greater than 5GB which this system occasionally has to deal with.

However, I then saw this answer on Stack Overflow to a related question which seems to imply that it can be done referring to a seemingly contradictory AWS link which suggests:

You can also change the storage class of an object that is already stored in Amazon S3 to any other storage class by making a copy of the object by using the PUT Object - Copy API operation. However, you can't use PUT Object - Copy to copy objects that are stored in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes. You also can't transition from S3 One Zone-IA to S3 Glacier Instant Retrieval.

You copy the object in the same bucket by using the same key name and specifying the request headers as follows:

Set the x-amz-metadata-directive header to COPY.

Set the x-amz-storage-class header to the storage class that you want to use.

The code given is as follows:

import boto3

s3 = boto3.client('s3')

copy_source = {
    'Bucket': 'mybucket',
    'Key': 'mykey'
}

s3.copy(
  copy_source, 'mybucket', 'mykey',
  ExtraArgs = {
    'StorageClass': 'STANDARD_IA',
    'MetadataDirective': 'COPY'
  }
)

However, judging by the first reference above, neither key on ExtraArgs would be valid for the S3 Client copy operation, so I'm confused whether this can be relied upon. Also, it seems to change the file in-situ rather than copying it.

I would like ideally to alter my code to:

    extra_args = {
        'CopySourceSSECustomerAlgorithm': <algorithm-string>,
        'CopySourceSSECustomerKey': <plaintext-key>,
        'SSECustomerAlgorithm': <algorithm-string>,
        'SSECustomerKey': <plaintext-key>,
        'StorageClass': 'GLACIER',  # Adding these two directives to action an immediate
        'MetadataDirective': 'COPY' # transition to Glacier Flexible Retrieval in the Target
    }
...
    response = client.copy(source, target_bucket, target_key, ExtraArgs=extra_args)

(You might notice here that the code uses CopySourceSSECustomerAlgorithm and CopySourceSSECustomerKey. I'm afraid I don't know the history behind that but clearly that also is not within the limited list given above, although this clearly works or at least, is not rejected)

I am planning to try this out but would be grateful if anyone can confirm if it's supported or if I've misunderstood how this works. Thanks!

Robin_W
asked 13 days ago265 views
1 Answer
1

The approach you're considering involves using the copy method of the boto3 S3 client and specifying the StorageClass and MetadataDirective in the ExtraArgs parameter.

While the StorageClass parameter is not officially documented as part of the allowed arguments for the copy method, the approach you found on Stack Overflow suggests that it might work.

However, it's essential to note that undocumented parameters may not be officially supported and could potentially change in future versions of the SDK.

Based on the info you provided , here are a few points to consider:

The approach you found on Stack Overflow seems to have worked for others, but it's not officially documented. Therefore, there's a level of uncertainty about its reliability and future compatibility.

Before proceeding, it's a good idea to test this approach thoroughly in a non-production environment to ensure it meets your requirements and behaves as expected. Additionally, consider reaching out to AWS support to clarify this with them

profile picture
EXPERT
answered 13 days ago
profile pictureAWS
EXPERT
reviewed 13 days ago
  • Thanks for your reply. Yes, that was actually the point of my question: I would like to know whether this is officially supported. I cannot make changes to this system where there are questions over whether the process was officially supported. There only appears to be ambiguity on the AWS pages that I've seen. Could you advise who in AWS would be best to confirm this?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions