I'm looking to change some Python 3 code which currently copies an SSE-C encoded file from one S3 bucket to another. On the target S3 bucket, the file will be stored in Glacier Flexible Retrieval format and that's currently achieved by lifecycle transition which is set on the target bucket. However, this causes a number of problems as it is asynchronous so we can't predict when the change will take place.
Ideally, I would like to copy the file so that the target is directly written to GFR storage class. However, when I looked at the documentation for boto3 s3 client copy I saw that the allowed arguments for ExtraArgs as documented here are only:
ALLOWED_DOWNLOAD_ARGS = ['ChecksumMode', 'VersionId', 'SSECustomerAlgorithm', 'SSECustomerKey', 'SSECustomerKeyMD5', 'RequestPayer', 'ExpectedBucketOwner']
So StorageClass
is not in there which implies that it's not possible. I looked at copy_object
as well, but it won't support files greater than 5GB which this system occasionally has to deal with.
However, I then saw this answer on Stack Overflow to a related question which seems to imply that it can be done referring to a seemingly contradictory AWS link which suggests:
You can also change the storage class of an object that is already stored in Amazon S3 to any other storage class by making a copy of the object by using the PUT Object - Copy API operation. However, you can't use PUT Object - Copy to copy objects that are stored in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes. You also can't transition from S3 One Zone-IA to S3 Glacier Instant Retrieval.
You copy the object in the same bucket by using the same key name and specifying the request headers as follows:
Set the x-amz-metadata-directive
header to COPY.
Set the x-amz-storage-class
header to the storage class that you want to use.
The code given is as follows:
import boto3
s3 = boto3.client('s3')
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.copy(
copy_source, 'mybucket', 'mykey',
ExtraArgs = {
'StorageClass': 'STANDARD_IA',
'MetadataDirective': 'COPY'
}
)
However, judging by the first reference above, neither key on ExtraArgs
would be valid for the S3 Client copy
operation, so I'm confused whether this can be relied upon. Also, it seems to change the file in-situ rather than copying it.
I would like ideally to alter my code to:
extra_args = {
'CopySourceSSECustomerAlgorithm': <algorithm-string>,
'CopySourceSSECustomerKey': <plaintext-key>,
'SSECustomerAlgorithm': <algorithm-string>,
'SSECustomerKey': <plaintext-key>,
'StorageClass': 'GLACIER', # Adding these two directives to action an immediate
'MetadataDirective': 'COPY' # transition to Glacier Flexible Retrieval in the Target
}
...
response = client.copy(source, target_bucket, target_key, ExtraArgs=extra_args)
(You might notice here that the code uses CopySourceSSECustomerAlgorithm
and CopySourceSSECustomerKey
. I'm afraid I don't know the history behind that but clearly that also is not within the limited list given above, although this clearly works or at least, is not rejected)
I am planning to try this out but would be grateful if anyone can confirm if it's supported or if I've misunderstood how this works. Thanks!
Thanks for your reply. Yes, that was actually the point of my question: I would like to know whether this is officially supported. I cannot make changes to this system where there are questions over whether the process was officially supported. There only appears to be ambiguity on the AWS pages that I've seen. Could you advise who in AWS would be best to confirm this?