How to List S3 Objects Not Accessed in Last 3 Months (No Last Access Metadata in Inventory)

0

I'm currently using AWS S3 to store my objects. My storage usage has grown to around 50 TB, resulting in high costs. I want to identify unused objects — specifically, those that haven’t been accessed in the last 3 months.

I looked into S3 Inventory, which provides metadata like the Last Modified date, but I need the Last Accessed date instead. My goal is to generate a downloadable list of objects that haven't been accessed in the past 3 months.

How can I approach this?

asked 25 days ago52 views
1 Answer
0

To identify S3 objects that haven't been accessed in the last 3 months, you'll need to combine a few AWS services and features. Here's an approach you can take:

  1. Enable S3 Server Access Logging: This will track all access requests to your S3 objects, including GET requests which indicate when an object was last retrieved.

  2. Use S3 Inventory: While S3 Inventory doesn't provide last access information directly, it gives you a comprehensive list of your objects and their metadata.

  3. Analyze the data using Amazon Athena: You can use Athena to query both the S3 Inventory data and the Server Access Logs to determine which objects haven't been accessed in the last 3 months.

Here's a step-by-step process:

  1. Enable S3 Server Access Logging for your bucket.
  2. Set up S3 Inventory to generate reports of your objects.
  3. Create Athena tables for both the S3 Inventory data and the Server Access Logs.
  4. Use Athena to run a query that joins the Inventory data with the Access Logs. This query should filter for objects that don't have any GET requests in the last 3 months.

An example Athena query might look like this:

SELECT
    inv.bucket,
    inv.key,
    inv.size,
    inv.last_modified_date,
    log.last_access_time
FROM
    inventory_table inv
LEFT JOIN (
    SELECT
        key,
        MAX(requestdatetime) AS last_access_time
    FROM
        access_log_table
    WHERE
        operation = 'REST.GET.OBJECT'
        AND requestdatetime >= date_add('month', -3, current_date)
    GROUP BY
        key
) AS log ON log.key = inv.key
WHERE
    log.last_access_time IS NULL;

This query will give you a list of objects that haven't been accessed in the last 3 months.

Remember that this approach has some limitations:

  • It relies on access logs, which only start capturing data from when you enable them.
  • There may be a delay between when an object is accessed and when it appears in the logs.
  • Analyzing large amounts of log data can be time-consuming and may incur additional costs.

Consider running this analysis periodically and using the results to inform your S3 Lifecycle policies for moving less frequently accessed data to cheaper storage tiers or for deletion.
Sources
Unlocking efficient data lifecycle management for Amazon S3 with last access times | AWS re:Post
Delete unused files from S3 | AWS re:Post

profile picture
answered 25 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions