By using AWS re:Post, you agree to the Terms of Use

'AWS suggested' way to deal with object lock, glacier, and File Gateway (ie lifecycle logic / lamdbas)


I'm interested (somewhat linked to other questions about how uploads to the Storage Gateway in File Gateway mode work when there is a need to hash data after copying to the NFS share) in the 'AWS suggested' ways to handle the mix of File Gateway, Object locking, and Deep Glacier transitions because it feel like its easy to set up wrong, and treble storage costs inadvertently.

I'm noticing for each upload, two object versions are made.

I read (Here) this is because of the way data is written to NFS where the file metadata is transferred after the file contents (which the storage gateway would have already started to upload). So Before I even start looking at Lifecycle rules I already have two object versions for each file (they both seem to be of the same size, so doubling storage costs out the bat)...... not ideal...

I then want to use a lifecycle rule to move data to Deep Glacier. But then I have one object version in Deep Glacier, and one version in S3 'Standard' (increasing storage costs)....

I want to take advantage of object locking to protect from ransomware 100% (that is after all one advantage of WORM). However, because I now have multiple object versions I need to have lifecycle rules to delete the older versions to save on costs, BUT, if you ONLY keep the latest version, there is a significant risk of ransomware encrypting data waiting to move to Deep Glacier (as lifecycle rules only run once per day). This would create a new 'encrypted' object version, which then potentially gets moved to Deep Glacier as 'the most recent' object, and old versions (ie the un-encrypted version that represents the actually object WE uploaded and is not affected by the ransomware) are then removed leading to a mix of ransomware affected objects in Deep Glacier with no way of knowing which is and isn't...

I hope that scenario makes sense?

We cannot just blindly keep the most recent object version, as it could have been affected by ransomware, but we cannot, and do not, want to keep a million object versions (of which a min. of two object versions are always made because of the way NFS metadata is written 2nd)...

I want to (as a side note) put legal holds onto Deep Glacier objects just because we effectively want indefinite retention, but still have the ability to remove and delete as needed when situations arise.

How do AWS envisage this is supposed to work? I would appreciate some insight as to the types of lifecycle policies, and or lambda functions that would be needed to purely from a 'system logic' perspective (ie rough ideas of what needs to be done, I'm not asking for documented examples - unless they already exist).


PS. not having versioning and object locking doesn't help the main aim of protecting the data from ransomware / malicious code as another way to destroy this 'archive data lake' would be to rm -rf from the NFS mount.... which without the object locking / versioning would delete all the data.....

1 Answer

It sounds like you don't want to keep the millions of object version but also can't disable object versioning because of possible ransomware attacks. I'd like to propose you to use S3 Lifecycle + Intelligent-Tiering storage class.

  • You can create a lifecycle rule to keep only certain number of versions. For instance, keep current version and additionally keep maximum five newer versions for objects aged more than 5 days. In S3 console -> lifecycle configuration, please check "Number of newer versions to retain - Optional" feature.
  • Although Glacier Deep Archive's storage cost is cheap, it's PUT request call is quite expensive. Also, it has minimum storage duration of 180 days. In many cases, this is not a best option for many of customer unless indeed archive usage. Somehow, the request fee could be more expensive than storage cost.
  • You may instead consider Intelligent-Tiering storage class. You can directly put objects to this class by Storage Gateway file share settings. It will automatically tiering your objects to the best storage class. In addition, you can additionally enable Archive Access tiers, which auto tier objects to Deep Archive Access Tier - same as Glacier Deep Archive storage cost. Although Intelligent-Tiering class costs for monitoring fee, considering PUT request call fee of regular Glacier Deep Archive class, this storage class can be much more cost effective. Each auto-tiering inter Intelligent-Tiering class are all free of charges.
  • You can enable default retention for Object Lock with Governance mode. In this case, the only governed user can remove the object version within retention period. Once retention expires, any allowed user to the bucket can remove the object.
answered 7 months ago
  • Thank you for taking the time to respond...

    I'm still confused as to the best way to deal with 2 versions being created immediately when a file is copied to the File Gateway NFS, or why AWS doesn't already have a solution to mitigate this as I see NO use-case where that behaviour is actually desirable...

    I'm going to assume that if there was a way to delay the File Gateway 'upload' until after the file was completely written to this would then stop the 2-version thing? I can 100% see a user configurable option to either delay upload from the on prem cache by a set time (either 0 for immediate, or any integer number of minutes/hours) basically solves most issues...

    Object locking is an important option for archive data, but its also kind of annoying that the whole versioning thing makes lifecycle management harder.

  • Sorry to comment again, I was just wondering if it was possible to have 'some' level of feedback from my previous comment re: dealing with the 2 object versions that are immediately created upon upload. It seems Like I may need some form of Lambda to trigger on each PUT into the bucket from the gateway to then check if the object is the newest and automatically remove the 2nd version.... it does seem a bit of a 'hacky' way to deal with a problem that (in my opinion) shouldn't really exist in the first place.....

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions