Archive Open ZFS data based on access-time

0

Is it possible to move Open ZFS data grater than a certain period of time (say 6 months) to another low cost filesystem (S3 glacier). Most of the data needs to be kept for compliance purpose, hence checking if any archival solution native or custom is available based on atime or mtime (/durationsincelastaccess) ?

Thomas
已提问 2 个月前190 查看次数
2 回答
1

This process is not directly supported by OpenZFS and requires a custom solution. Here's a high-level approach to automate the migration of data older than a specified period, such as 6 months, to Amazon S3 Glacier:

  1. Identify Older Files First, use tools and scripts to identify files older than 6 months. You can use the find command in Unix-based systems to list these files:
find /path/to/zfs/dataset -type f -mtime +180

This command lists files modified more than 180 days ago.

  1. Archive and Transfer Before moving files to S3 Glacier, consider archiving them to reduce the number of objects and possibly save on costs. You can use tar or other compression tools for this purpose:
tar -czvf archive-name.tar.gz /path/to/older/files

  1. Upload to Amazon S3 Glacier You can use the AWS CLI to upload the archived files directly to an S3 bucket configured for Glacier storage:
aws s3 cp archive-name.tar.gz s3://your-bucket-name/path/to/archive/ --storage-class DEEP_ARCHIVE

The DEEP_ARCHIVE storage class offers the lowest cost storage option in S3 but with a retrieval time of 12 hours or more.

  1. Automate the Process To automate this process, you can create a script that performs these steps and schedule it to run periodically using cron jobs or other scheduling tools.
profile picture
专家
已回答 2 个月前
profile picture
专家
Artem
已审核 1 个月前
profile picture
专家
已审核 2 个月前
0

As the question is tagged with Amazon FSx for OpenZFS what follows assumes that's where the data is located that needs to be migrated (and not, say, a third-party on-prem OpenZFS product) then AWS DataSync is the way to go.

https://aws.amazon.com/datasync/faqs/#Data_movement

Q: Where can I move data to and from?

A: DataSync supports the following storage location types: .... Amazon Simple Storage Service (Amazon S3), .... Amazon FSx for OpenZFS file systems

Even if your data is currently on-prem it may still be worth looking into.

Q: How do I use AWS DataSync to migrate data to AWS?

A: You can use AWS DataSync to migrate data located on premises, at the edge, or in other clouds to Amazon S3

The above mentions "plain" S3, but Glacier also gets a call-out in the same section of the FAQ.

Q: How do I use AWS DataSync to archive cold data?

A: You can use AWS DataSync to move cold data from on-premises storage systems directly to durable and secure long-term storage, such as Amazon S3 Glacier Flexible Retrieval (formerly S3 Glacier) or Amazon S3 Glacier Deep Archive.

profile picture
专家
Steve_M
已回答 2 个月前
profile picture
专家
已审核 1 个月前
  • I had checked DataSync, while it allows moving data between FSx and S3 (I did not test it), I did not find any option to specify a rule. My requirement is to not just move data between FSx and S3 but archival of files greater than a certain age. Please let me know if my understanding is incorrect. Thanks

  • I haven't tried it myself either, according to https://docs.aws.amazon.com/datasync/latest/userguide/create-s3-location.html#using-storage-classes

    New objects copied to an S3 bucket are stored using the storage class that you specify when creating your Amazon S3 transfer location.

    The steps to do create the S3 transfer location & specify the storage class are at https://docs.aws.amazon.com/datasync/latest/userguide/create-s3-location.html#create-s3-location-how-to

    To create an Amazon S3 location

    1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/

    2. In the left navigation pane, expand Data transfer, then choose Locations and Create location.

    3. For Location type, choose Amazon S3.

    4. For S3 bucket, choose the bucket that you want to use as a location. (When creating your DataSync task later, you specify whether this location is a transfer source or destination.)

    5. For S3 storage class, choose a storage class that you want your objects to use when Amazon S3 is a transfer destination.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则