- Newest
- Most votes
- Most comments
When you restore an object that was in Glacier Flexible Retrieval or Glacier Deep Archive, you create a temporary object. At the time of restore you set an expiration for this temporary object. You can track the current expiration through the x-amz-restore response element.
I will assume that your actual problem relates to inconsistency within a set of objects, or when working with very large objects that you download via multiple requests. S3 should never abort an in-progress request, or return a partial object. But, yes, if you use Ranged GETs and thus make multiple independent requests, it's possible that a later request(s) would fail if the temporary object's expiration was finalized in between the first and the last of these requests.
I'd suggest that before your job you can review the expiration of temporary objects via the HeadObject API and if necessary extend it through an additional call to RestoreObject. Calling RestoreObject on an object that already has a temporary object will not cause additional data retrieval $/GB fees, though you will still pay the per data retrieval request cost for calling the API. You'll also pay for the S3 Storage consumed by the temporary object during the longer expiration time.
Relevant content
- AWS OFFICIALUpdated a year ago
Thanks for your reply. That response element is useful but unfortunately I have probably 10^5 or 10^6 objects to restore so polling them all isn't really viable. However, I found I can use
upload_file
passing a parameter ofExtraArgs={'StorageClass': 'GLACIER'}
which takes care of my UTs having to wait hours to check whether file are ready to be restored. On restoring, to an extent I will have to manage access separately I think, since the asynchronous nature of how restores work means I can't rely on object state from one second to the next, only on what operations I've requested.I think I understand now, your objective is to know the object is in Glacier Flexible Retrieval, because you need to validate the way your code behaves when the object is already in Glacier Flexible Retrieval, and thus not immediately available. What you outlined is one option, as it will create a new object version, which will now take precedence over the version that has the temporary copy, in any test that doesn't specifically state an object version id. In case you're not aware, do be aware that if your bucket has versioning enabled, you'll pay for this new object version in addition to the version that is no longer current.
You are also correct, there is no mechanism to immediately expire the temporary copy, you can overwrite it, as you've done, or you could delete the entire object version (including the copy in Glacier), but you can't specifically target that temporary copy, as until it expires it responds to the same versionId as the permanent copy.