Archive Open ZFS data based on access-time

0

Is it possible to move Open ZFS data grater than a certain period of time (say 6 months) to another low cost filesystem (S3 glacier). Most of the data needs to be kept for compliance purpose, hence checking if any archival solution native or custom is available based on atime or mtime (/durationsincelastaccess) ?

Thomas
asked a month ago179 views
2 Answers
1

This process is not directly supported by OpenZFS and requires a custom solution. Here's a high-level approach to automate the migration of data older than a specified period, such as 6 months, to Amazon S3 Glacier:

  1. Identify Older Files First, use tools and scripts to identify files older than 6 months. You can use the find command in Unix-based systems to list these files:
find /path/to/zfs/dataset -type f -mtime +180

This command lists files modified more than 180 days ago.

  1. Archive and Transfer Before moving files to S3 Glacier, consider archiving them to reduce the number of objects and possibly save on costs. You can use tar or other compression tools for this purpose:
tar -czvf archive-name.tar.gz /path/to/older/files

  1. Upload to Amazon S3 Glacier You can use the AWS CLI to upload the archived files directly to an S3 bucket configured for Glacier storage:
aws s3 cp archive-name.tar.gz s3://your-bucket-name/path/to/archive/ --storage-class DEEP_ARCHIVE

The DEEP_ARCHIVE storage class offers the lowest cost storage option in S3 but with a retrieval time of 12 hours or more.

  1. Automate the Process To automate this process, you can create a script that performs these steps and schedule it to run periodically using cron jobs or other scheduling tools.
profile picture
EXPERT
answered a month ago
profile picture
EXPERT
Artem
reviewed a month ago
profile picture
EXPERT
reviewed a month ago
0

As the question is tagged with Amazon FSx for OpenZFS what follows assumes that's where the data is located that needs to be migrated (and not, say, a third-party on-prem OpenZFS product) then AWS DataSync is the way to go.

https://aws.amazon.com/datasync/faqs/#Data_movement

Q: Where can I move data to and from?

A: DataSync supports the following storage location types: .... Amazon Simple Storage Service (Amazon S3), .... Amazon FSx for OpenZFS file systems

Even if your data is currently on-prem it may still be worth looking into.

Q: How do I use AWS DataSync to migrate data to AWS?

A: You can use AWS DataSync to migrate data located on premises, at the edge, or in other clouds to Amazon S3

The above mentions "plain" S3, but Glacier also gets a call-out in the same section of the FAQ.

Q: How do I use AWS DataSync to archive cold data?

A: You can use AWS DataSync to move cold data from on-premises storage systems directly to durable and secure long-term storage, such as Amazon S3 Glacier Flexible Retrieval (formerly S3 Glacier) or Amazon S3 Glacier Deep Archive.

profile picture
EXPERT
Steve_M
answered a month ago
profile picture
EXPERT
reviewed a month ago
  • I had checked DataSync, while it allows moving data between FSx and S3 (I did not test it), I did not find any option to specify a rule. My requirement is to not just move data between FSx and S3 but archival of files greater than a certain age. Please let me know if my understanding is incorrect. Thanks

  • I haven't tried it myself either, according to https://docs.aws.amazon.com/datasync/latest/userguide/create-s3-location.html#using-storage-classes

    New objects copied to an S3 bucket are stored using the storage class that you specify when creating your Amazon S3 transfer location.

    The steps to do create the S3 transfer location & specify the storage class are at https://docs.aws.amazon.com/datasync/latest/userguide/create-s3-location.html#create-s3-location-how-to

    To create an Amazon S3 location

    1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/

    2. In the left navigation pane, expand Data transfer, then choose Locations and Create location.

    3. For Location type, choose Amazon S3.

    4. For S3 bucket, choose the bucket that you want to use as a location. (When creating your DataSync task later, you specify whether this location is a transfer source or destination.)

    5. For S3 storage class, choose a storage class that you want your objects to use when Amazon S3 is a transfer destination.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions