- Newest
- Most votes
- Most comments
To identify S3 objects that haven't been accessed in the last 3 months, you'll need to combine a few AWS services and features. Here's an approach you can take:
-
Enable S3 Server Access Logging: This will track all access requests to your S3 objects, including GET requests which indicate when an object was last retrieved.
-
Use S3 Inventory: While S3 Inventory doesn't provide last access information directly, it gives you a comprehensive list of your objects and their metadata.
-
Analyze the data using Amazon Athena: You can use Athena to query both the S3 Inventory data and the Server Access Logs to determine which objects haven't been accessed in the last 3 months.
Here's a step-by-step process:
- Enable S3 Server Access Logging for your bucket.
- Set up S3 Inventory to generate reports of your objects.
- Create Athena tables for both the S3 Inventory data and the Server Access Logs.
- Use Athena to run a query that joins the Inventory data with the Access Logs. This query should filter for objects that don't have any GET requests in the last 3 months.
An example Athena query might look like this:
SELECT inv.bucket, inv.key, inv.size, inv.last_modified_date, log.last_access_time FROM inventory_table inv LEFT JOIN ( SELECT key, MAX(requestdatetime) AS last_access_time FROM access_log_table WHERE operation = 'REST.GET.OBJECT' AND requestdatetime >= date_add('month', -3, current_date) GROUP BY key ) AS log ON log.key = inv.key WHERE log.last_access_time IS NULL;
This query will give you a list of objects that haven't been accessed in the last 3 months.
Remember that this approach has some limitations:
- It relies on access logs, which only start capturing data from when you enable them.
- There may be a delay between when an object is accessed and when it appears in the logs.
- Analyzing large amounts of log data can be time-consuming and may incur additional costs.
Consider running this analysis periodically and using the results to inform your S3 Lifecycle policies for moving less frequently accessed data to cheaper storage tiers or for deletion.
Sources
Unlocking efficient data lifecycle management for Amazon S3 with last access times | AWS re:Post
Delete unused files from S3 | AWS re:Post
Relevant content
- asked 2 years ago