Allocate Amazon S3 objects costs by business unit using Storage Lens and Athena
Organizations using shared S3 buckets across multiple business units or projects face challenges in accurately allocating storage costs. This solution demonstrates how to implement automated cost allocation for S3 storage using Storage Lens and Athena.
Solution Overview
This solution uses the following AWS services:
- AWS Cost and Usage Reports (CUR) - Provides detailed cost data
- Amazon S3 Storage Lens - Offers storage analytics and grouping capabilities
- AWS Glue - Creates and maintains the data catalog
- Amazon Athena - Queries the cost and usage data
The solution works as follows:
- CUR delivers daily cost data to an S3 bucket
- S3 Storage Lens collects storage metrics and group information
- AWS Glue crawlers create and maintain table metadata
- Athena queries combine cost and usage data
AWS Service Requirements and Limitations
- Supported Regions: All regions where S3 Storage Lens is available
- S3 Storage Lens metrics may take up to 48 hours to appear
- Storage Lens advanced metrics have associated costs
- Athena query costs apply based on data scanned
Prerequisites
Before implementing this solution, ensure you have:
- An AWS account
- AWS Identity and Access Management (IAM) permissions for the following services:
- AWS Data Exports - Create Cost and Usage Reports.
- Amazon S3 - Create and manage S3 buckets.
- S3 Storage Lens - Create dashboards with advanced metrics enabled
- AWS Glue - Create databases, tables and crawlers
- Athena - Start queries
Implementation Steps
1. Create AWS Data Exports
Please review the documentation about how to create a standard data export with Cost and Usage Report 2.0 (CUR 2.0)
AWS Data Exports is used to create your Cost and Usage Report (CUR), which provides comprehensive cost and usage data. You'll use this data to analyze S3 storage costs.
Key configuration points:
- Use CUR 2.0 format for improved query performance
- Enable hourly granularity for more detailed analysis
- Include resource IDs to track individual bucket costs
- Enable data refresh for up-to-date reporting
Follow the CUR setup guide to create your export with these settings.
2. Configure AWS Glue Data Catalog
AWS Glue Data Catalog serves as the metadata repository for your cost and usage data. We'll create a dedicated database to organize our tables:
aws glue create-database \ --database-input '{ "Name": "s3_cost_allocation_db", "Description": "Database for Cost and Usage Reports" }'
This database will store two main tables:
- CUR data table: Contains cost information
- Storage Lens table: Contains storage metrics
3. Create the table using crawlers for the Cost and Usage Report (CUR)
This command will create a glue table using AWS Glue Crawlers. The monthly schedule (cron(0 6 5 * ? *)) ensures we capture new cost data partitions while minimizing crawler runs.
aws glue create-crawler \ --name "cur_crawler" \ --role "AWSGlueServiceRole" \ --database-name "s3_cost_allocation_db" \ --schedule "cron(0 6 5 * ? *)" \ --targets '{ "S3Targets": [ { "Path": "s3://cur2-us-east-1-<your_bucket>/.../data/" } ] }' \ --table-prefix "cur2_" \ --description "Crawler for Cost and Usage Reports"
Run the crawler to create the table.
aws glue start-crawler —name "cur_crawler"
If the bucket is in a different account where the solution is going to be implemented you must add the following resource policy to the bucket to allow read access to Glue.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "Labs-Glue", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<source_account>:root" }, "Action": [ "s3:GetObject", "s3:GetBucketLocation", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::cur2-us-east-1-<your_bucket>", "arn:aws:s3:::cur2-us-east-1-<your_bucket>/*" ] } ] }
4. Create Storage Lens configurations
Storage Lens helps organize and analyze your S3 usage. We'll create groups based on your organization's structure using either tags or prefixes.
If you are using objects tags approach you may have tags like the following:
- Tag key: project
- Tag values: project-1, project-2, project-3, ...project-n
If you are using prefix approach you may have prefixes like the following:
- project_1/
- project_2/
- project_3/
- project_n/
The next AWS CLI commands show you how to create the groups according to your use case.
Per Object tags.
aws s3control create-storage-lens-group \ --account-id <your_account_id> \ --storage-lens-group '{"Name": "project_1-tag-group", "Filter": {"MatchAnyTag": [{"Key": "project", "Value": "project_1"}]}}'
Per Prefixes.
aws s3control create-storage-lens-group \ --account-id <your_account_id> \ --storage-lens-group '{"Name": "project_1-prefix-group", "Filter": {"MatchAnyPrefix": ["my_path/inner_folder/"]}}'
Once the groups are created, we should create a storage lens dashboard with the advanced metrics enabled and attach the groups to the dashboard. For this dashboard we will export the metrics daily to a S3 bucket so we can query later.
aws s3control put-storage-lens-configuration \ --config-id S3-PrefixDashboard \ --account-id $(aws sts get-caller-identity --query Account --output text) \ --storage-lens-configuration '{ "Id": "S3-PrefixDashboard", "IsEnabled": true, "AccountLevel": { "ActivityMetrics": { "IsEnabled": true }, "AdvancedCostOptimizationMetrics": { "IsEnabled": true }, "AdvancedDataProtectionMetrics": { "IsEnabled": true }, "DetailedStatusCodesMetrics": { "IsEnabled": true }, "BucketLevel": { "ActivityMetrics": { "IsEnabled": true }, "AdvancedDataProtectionMetrics": { "IsEnabled": true }, "AdvancedCostOptimizationMetrics": { "IsEnabled": true }, "DetailedStatusCodesMetrics": { "IsEnabled": true }, "PrefixLevel": { "StorageMetrics": { "IsEnabled": true, "SelectionCriteria": { "Delimiter": "/", "MaxDepth": 6, "MinStorageBytesPercentage": 3.0 } } } }, "StorageLensGroupLevel": { "SelectionCriteria": { "Include": [ "arn:aws:s3:us-east-1:<account_id>:storage-lens-group/project_1-tag-group" ] } } }, "DataExport": { "S3BucketDestination": { "AccountId": "$(aws sts get-caller-identity --query Account --output text)", "Arn": "arn:aws:s3:::s3-storage-lens-metrics-us-east-1-<your_bucket>", "Format": "Parquet", "OutputSchemaVersion": "V_1" }, "CloudWatchMetrics": { "IsEnabled": false } }'
This configuration will take up to 48 hours to start sending metrics, then, their will be sent daily. You will see the Storage Lens groups metrics after that.
Once the first metrics are exported, we can create the glue data catalog table to be able to query the information. With the following command we are creating the crawler to run daily at 6 am and it has the storage lens bucket as data source.
aws glue create-crawler \ --name "storage_lens_crawler" \ --role "AWSGlueServiceRole" \ --database-name "s3_cost_allocation_db" \ --schedule "cron(0 6 * * ? *)" \ --targets '{ "S3Targets": [ { "Path": "s3://s3-storage-lens-metrics-us-east-1-<your_bucket>/StorageLens/<account_id>/S3-PrefixDashboard/V_1/reports/" } ] }' \ --table-prefix "storage_lens_" \ --description "Crawler for Storage Lens Metrics"
aws glue start-crawler —name "storage_lens_crawler"
5. Use Amazon Athena to query the information
Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL.
Now we have the prerequisites, and we can take insights from the information. This query shows the daily storage costs for each S3 bucket.
select line_item_usage_start_date as dt, sum(line_item_blended_cost) as line_item_blended_cost, line_item_resource_id as bucket_name from "s3_cost_allocation_db"."cur_data" where line_item_product_code = 'AmazonS3' and product_servicecode = 'AmazonS3' and line_item_operation IN ('StandardStorage', 'IntelligentTieringStorage', 'GlacierStorage', 'DeepArchiveStorage', 'ReducedRedundancyStorage') group by 1, 3
This query provides daily storage metrics per bucket.
select date_parse(report_date, '%Y-%m-%d') dt, cast(metric_value as double) as totalBytes, bucket_name from "s3_cost_allocation_db"."storage_lens_reports" where metric_name = 'StorageBytes' and record_type = 'BUCKET'
This query analyzes storage usage by Storage Lens group.
select date_parse(report_date, '%Y-%m-%d') dt, cast(sum(metric_value) as double) as targetObjGroupBytes, bucket_name, record_value as objGroupArn from "s3_cost_allocation_db"."storage_lens_reports" where metric_name = 'StorageBytes' and record_type = 'STORAGE_LENS_GROUP_BUCKET' group by 1, 3, 4
This comprehensive query combines all previous data to show costs per group.
with cost as ( select line_item_usage_start_date as dt, sum(line_item_blended_cost) as line_item_blended_cost, line_item_resource_id as bucket_name from "s3_cost_allocation_db"."cur_jdao_org_data" where line_item_product_code = 'AmazonS3' and product_servicecode = 'AmazonS3' and line_item_operation IN ('StandardStorage', 'IntelligentTieringStorage', 'GlacierStorage', 'DeepArchiveStorage', 'ReducedRedundancyStorage') group by 1, 3 ), total as ( select date_parse(report_date, '%Y-%m-%d') dt, cast(metric_value as double) as totalBytes, bucket_name from "s3_cost_allocation_db"."storage_lens_reports" where metric_name = 'StorageBytes' and record_type = 'BUCKET' ), target as ( select date_parse(report_date, '%Y-%m-%d') dt, cast(sum(metric_value) as double) as targetObjGroupBytes, bucket_name, record_value as objGroupArn from "s3_cost_allocation_db"."storage_lens_reports" where metric_name = 'StorageBytes' and record_type = 'STORAGE_LENS_GROUP_BUCKET' group by 1, 3, 4 ) select target.dt, target.bucket_name, target.objGroupArn, split_part(target.objGroupArn, '/', 2) as objGroupName, cast(target.targetObjGroupBytes as decimal(30,2)) as targetObjGroupBytes, cast(total.totalBytes as decimal(30,2)) as totalBytes, cast((target.targetObjGroupBytes / total.totalBytes * 100) as decimal(5,2)) as percentUsed, cast(cost.line_item_blended_cost as decimal(20,8)) as totalCost, cast(cost.line_item_blended_cost *(target.targetObjGroupBytes / total.totalBytes) as decimal(20,8)) as objGroupCost from target, total, cost where target.dt = total.dt and target.dt = cost.dt and target.bucket_name = total.bucket_name and target.bucket_name = cost.bucket_name
Cleanup
To avoid ongoing charges, delete the resources you created.
aws glue delete-crawler --name "storage_lens_crawler" aws glue delete-crawler --name "cur_crawler" aws glue delete-database --name "s3_cost_allocation_db"
- Disable Storage Lens advanced metrics
- Delete Storage Lens groups
- Disable CUR reports if no longer needed
Summary
This solution enables automated S3 storage cost allocation across business units using Storage Lens groups and Athena queries. Organizations can now track, analyze, and report storage costs accurately at the business unit level.
Next Steps
- Customize the Athena queries for your specific reporting needs
- Add additional Storage Lens groups for finer-grained cost allocation
- Explore creating interactive QuickSight dashboards to visualize the Athena-powered cost analysis
Related Information
- Language
- English
Relevant content
- asked 4 years ago
- asked 6 years ago