Synchronising freeze/thaw using Glacier Flexible Storage (GFS)

0

I'm working with an established system that moves objects from an S3 once they reach a certain age to another archive S3 bucket where they are lifecycle transitioned to Glacier Flexible Retrieval . Once in GFR, a user can then restore the objects (we use restore_object in Python boto3) and copy them back to the original S3 bucket.

My problem is that the lifecycle transition to GFR and then the re-freezing back to GFR once the thaw has concluded, is performed asynchronously, by AWS as I understand it.

While this is understandable for large data sets, it makes automated testing very difficult as it's unpredictable when the transition will occur. It also makes live operation somewhat awkward as, if a customer requests to access an object at a time when that object is in the process of being frozen initially or re-frozen after temporarily thawed, we get errors and an incomplete data set. This doesn't happen very often but when it does, it usually causes issues for us.

We therefore need some way of either synchronising the freeze, so that we know when it's complete, or locking the object to avoid concurrently accessing it while it is being manipulated by lifecycle transitions.

Can anyone advise a way of protecting access to avoid errors and, ideally, synching the freeze process so that our unit tests can be guaranteed to at least access the objects predictably?

Robin_W
asked 4 months ago103 views
1 Answer
1

When you restore an object that was in Glacier Flexible Retrieval or Glacier Deep Archive, you create a temporary object. At the time of restore you set an expiration for this temporary object. You can track the current expiration through the x-amz-restore response element.

I will assume that your actual problem relates to inconsistency within a set of objects, or when working with very large objects that you download via multiple requests. S3 should never abort an in-progress request, or return a partial object. But, yes, if you use Ranged GETs and thus make multiple independent requests, it's possible that a later request(s) would fail if the temporary object's expiration was finalized in between the first and the last of these requests.

I'd suggest that before your job you can review the expiration of temporary objects via the HeadObject API and if necessary extend it through an additional call to RestoreObject. Calling RestoreObject on an object that already has a temporary object will not cause additional data retrieval $/GB fees, though you will still pay the per data retrieval request cost for calling the API. You'll also pay for the S3 Storage consumed by the temporary object during the longer expiration time.

profile pictureAWS
answered 4 months ago
  • Thanks for your reply. That response element is useful but unfortunately I have probably 10^5 or 10^6 objects to restore so polling them all isn't really viable. However, I found I can use upload_file passing a parameter of ExtraArgs={'StorageClass': 'GLACIER'} which takes care of my UTs having to wait hours to check whether file are ready to be restored. On restoring, to an extent I will have to manage access separately I think, since the asynchronous nature of how restores work means I can't rely on object state from one second to the next, only on what operations I've requested.

  • I think I understand now, your objective is to know the object is in Glacier Flexible Retrieval, because you need to validate the way your code behaves when the object is already in Glacier Flexible Retrieval, and thus not immediately available. What you outlined is one option, as it will create a new object version, which will now take precedence over the version that has the temporary copy, in any test that doesn't specifically state an object version id. In case you're not aware, do be aware that if your bucket has versioning enabled, you'll pay for this new object version in addition to the version that is no longer current.

    You are also correct, there is no mechanism to immediately expire the temporary copy, you can overwrite it, as you've done, or you could delete the entire object version (including the copy in Glacier), but you can't specifically target that temporary copy, as until it expires it responds to the same versionId as the permanent copy.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions