Unlocking efficient data lifecycle management for Amazon S3 with last access times
This article guides you on how to track the last access time of Amazon Simple Storage Service (Amazon S3) to optimize data storage. It also includes information on how to better manage your Amazon S3 storage and reduce unnecessary expenses.
Introduction
To keep costs under control, it's crucial to manage your cloud storage efficiently, especially when you use Amazon S3 to store large amounts of data. A challenge that many organizations face is determining how often an object is accessed, because this usage insight can affect cost optimization decisions. When you can identify objects that you haven't accessed for an extended period, you can make informed decisions about the following items:
-
Data retention
-
Lifecycle policies
-
Storage class transitions
These decisions can translate into cost savings in the following ways:
-
You can move infrequently accessed data to lower-cost storage classes
-
You can delete unused data altogether
AWS Enterprise Support is always looking for ways to help customers save money and get the most value out of their Amazon S3 investments. Your designated Technical Account Manager (TAM) plays a crucial role in this effort and works with you to maximize the return on your AWS investment.
This article explores practical methods to determine when an S3 object was last accessed. This ability provides insights into your data usage and helps you streamline cloud storage management, all while you save money.
Challenges in determining the last access time
When you determine the last access time for S3 objects, the process isn't always straightforward. Unlike traditional file systems, Amazon S3 doesn't inherently track the "last accessed" timestamp. By default, Amazon S3 only captures the "last modified" date, which indicates when an object was last changed. This challenge can make it difficult to differentiate between data that is actively read and data that has been sitting idle without any recent access.
While Amazon S3 offers features like S3 Intelligent-Tiering and Storage Class Analysis, these options don't provide granular detail at the individual object level. Additionally, S3 Intelligent-Tiering applies only to objects larger than 128 KB, which can keep smaller objects out of scope. Without insight into the last time each object was accessed, organizations risk unnecessarily holding onto idle data, which can lead to higher storage costs.
To manage your data efficiently, it's essential to know when objects were last accessed. TAMs can work closely with you to identify these opportunities and help you optimize storage use and cost savings.
Use S3 Inventory and S3 server access logs to determine last access time
To track S3 object access times, you can use two AWS features: S3 Inventory and S3 server access logs. When you combine the data from these sources, you gain a complete view of both object metadata and access activity. You can use these insights to identify when each object was last accessed.
The following diagram shows how you can use S3 Inventory and S3 server to track object access times.
S3 Inventory for metadata reporting
S3 Inventory is a feature that provides detailed reports on the contents of your S3 buckets, and can include the following metadata:
-
Object keys
-
Sizes
-
Storage classes
-
Last modified dates
S3 Inventory doesn't include access information, but it serves as a foundational dataset to understand your current storage landscape.
To configure Amazon Athena to query S3 Inventory, follow these two key steps:
- Configure your S3 Inventory: Set up S3 Inventory to generate reports for a specific bucket or prefix. For this process, you must define the report's scope, frequency, output format, and destination for the report storage.
- Configure Athena to query the S3 Inventory: To set up Athena to query the S3 Inventory reports, create an external table that points to the inventory data stored in Amazon S3. Then, you can run SQL queries on your bucket's metadata, such as object sizes, last modified dates, and storage classes.
S3 server access logs for access tracking
To get insights into how frequently you access objects, you must turn on the S3 server access logs options. These logs record every request made to your S3 bucket, and include the time of the request, the object accessed, and other request-specific details. When you combine this access data with the metadata from S3 Inventory, you can pinpoint the last time you accessed each object.
To configure Athena to query your S3 server access logs, complete the following steps:
- Configure the S3 server access logs: Set up S3 server access logs to record requests made to your S3 bucket.
- Configure Athena to query S3 server access logs: To set up Athena to query the S3 server access logs, create an external table that points to the log data stored in Amazon S3. You can use this configuration to run SQL queries and analyze access patterns.
Join S3 Inventory and S3 server access logs with Athena
After you configure Athena to query S3 Inventory and S3 server access logs, you can join the data to determine the last access time for objects. This approach provides a complete view of both the metadata and usage activity for your S3 objects.
To join the data from these sources, complete the following steps:
-
Open the Athena console.
-
In the Query Editor, run the following query:
SELECT
inv.bucket,
inv.key,
inv.size,
inv.last_modified_date,
log.last_access_time
FROM
demo_inventory inv
RIGHT JOIN (
SELECT
key,
date_format(
parse_datetime(MAX(requestdatetime), 'dd/MMM/yyyy:HH:mm:ss Z'),
'%Y-%m-%dT%H:%i:%s.000Z'
) AS last_access_time
FROM
demo_access_log_partitioned
WHERE
operation = 'REST.GET.OBJECT'
GROUP BY
key
) AS log ON log.key = inv.key
WHERE
try_cast(inv.size AS decimal) > 10
GROUP BY
inv.bucket,
inv.key,
inv.size,
inv.last_modified_date,
log.last_access_time;
This query provides the following information about an object:
-
The name of the S3 bucket
-
The name of the S3 object
-
The size in MB
-
When the object was last modified
-
The last time the object was accessed. If this column is empty, then the object hasn't been accessed since the S3 server access log was configured.
In the preceding subquery, the query is specifically filtered for 'REST.GET.OBJECT' operations. This filter captures only the read requests for S3 objects and makes sure that the access tracking is for genuine usage activity.
When you can identify objects that haven't been accessed for an extended time, you can more easily optimize your storage and reduce your costs. This is an area where AWS Enterprise Support customers can achieve significant savings by right-sizing their Amazon S3 footprint.
Note: There are cost associated with the use of S3 Inventory, S3 server access logs, and Athena:
-
S3 Inventory: When you generate inventory reports, you incur costs based on the number of objects in your bucket and frequency of the reports.
-
S3 server access logs: Storing access logs can generate additional costs based on the volume of requests logged.
-
Athena: When you run queries in Athena, you incur charges based on the amount of data scanned and total number of executions. To minimize costs, it's a best practice to run the query only on an ad-hoc basis during the optimization process.
Cost optimization opportunities
When you combine S3 Inventory and S3 server access logs, you can access the following cost optimization opportunities:
-
Identify and delete unused data: Objects that haven't been accessed in an extended period can be safely deleted, and you can free up storage space and reduce costs.
-
Optimize storage classes: For data that's not frequently accessed, you can it to more cost-effective storage classes such as S3 Glacier or S3 Glacier Deep Archive.
This article includes a few examples of how AWS customers can save money on their Amazon S3 storage. AWS Enterprise Support proactively surfaces cost optimization opportunities to help customers maximize the value of their cloud investments.
Conclusion
To make informed data management decisions and optimize storage costs, it's essential to determine the last access time of S3 objects. When you use S3 Inventory and S3 server access logs with Athena, you gain a comprehensive view of your storage landscape. With this view, you can identify and reduce unnecessary spending.
AWS Support engineers and TAMs can provide general guidance, best practices, troubleshooting, and operational support on AWS. To learn more about our plans and offerings, see AWS Support.
About the authors
Alonso de Cosio is a Principal TAM at AWS. In his role, he provides advocacy and strategic technical guidance to help customers use AWS best practices to plan and build solutions. He's passionate about using serverless technologies to build modular and scalable enterprise systems on AWS. Beyond work, Alonso enjoys spending time with his wife and dog, going to the beach, and traveling.
Sudheer Sangunni is a Senior TAM in AWS Enterprise Support. With his extensive expertise in the AWS Cloud, Big Data, Sudheer helps customers enhance their monitoring and observability capabilities within AWS.
Relevant content
- AWS OFFICIALUpdated 5 months ago
- Accepted Answerasked 10 months agolg...
- asked 4 years agolg...
- Accepted Answerasked 10 months agolg...
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 13 days ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago