- Newest
- Most votes
- Most comments
To track and identify the costs associated with data transfer from AWS S3 to the internet we can enable S3 Server Access Logs and query the data through Athena.
Enable S3 Server Access Logging
1. Open the Amazon S3 console.
2. Select the source bucket for which you want to enable logging.
3. Click on the "Properties" tab.
4. Scroll to the "Server access logging" section and click "Edit".
5. Enable server access logging by checking the box.
6. Choose a target bucket for the logs (create a new bucket if necessary).
7. Optionally, specify a target prefix for log objects.
8. Save changes.
Create an Athena Table for S3 Access Logs
1. Open the Amazon Athena console.
2. Ensure you have a database selected or create a new one.
3. Use the following SQL command to create a table for your S3 access logs:
CREATE EXTERNAL TABLE IF NOT EXISTS "s3_access_logs_your_bucket" (
bucketowner STRING,
bucket STRING,
requestdatetime STRING,
remoteip STRING,
requester STRING,
requestid STRING,
operation STRING,
key STRING,
request_uri STRING,
httpstatus STRING,
errorcode STRING,
bytessent BIGINT,
objectsize BIGINT,
totaltime STRING,
turnaroundtime STRING,
referrer STRING,
useragent STRING,
versionid STRING,
hostid STRING,
sigv STRING,
ciphersuite STRING,
authtype STRING,
endpoint STRING,
tlsversion STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' = '([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\") (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\") ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$'
)
LOCATION 's3://your-bucket/';
Replace 's3://your-bucket/' with your actual log bucket and prefix.
Verify the Table:
Run a simple query to ensure the table is correctly set up:
SELECT * FROM s3_access_logs_your_bucket LIMIT 10;
Run query
Once table is verified we can query the table.
SELECT
bucketowner,
bucket,
key,
MAX(objectsize) as object_size_bytes,
ROUND(MAX(objectsize) / 1024.0 / 1024.0 / 1024.0, 2) as object_size_gb,
SUM(bytessent) as total_bytes_transferred,
ROUND(SUM(bytessent) / 1024.0 / 1024.0 / 1024.0, 2) as total_gb_transferred,
COUNT(*) as request_count,
-- Standard S3 data transfer pricing to internet
ROUND((SUM(bytessent) / 1024.0 / 1024.0 / 1024.0) * 0.09, 2) as estimated_cost_usd
FROM s3_access_logs_test_logs_logs
WHERE operation = 'REST.GET.OBJECT'
AND requester != 'Amazon'
GROUP BY
bucketowner,
bucket,
key
ORDER BY estimated_cost_usd DESC
LIMIT 100;
** For this example all data is in standard storage.
Important Note on Query Results:
The output is sorted by the 'estimated_cost_usd' column in descending order, which means the objects are ranked from highest to lowest data transfer cost.
| bucketowner | bucket | key | object_size_bytes | object_size_gb | total_bytes_transferred | total_gb_transferred | request_count | estimated_cost_usd |
|---|---|---|---|---|---|---|---|---|
| -** | --**** | ********.mov | 1631728498 | 1.52 | 8158642490 | 7.6 | 5 | 0.68 |
| -** | --**** | ********.rtf | 1046 | 0 | 9592 | 0 | 11 | 0 |
| -** | --**** | ********.csv | 872 | 0 | 2092 | 0 | 2 | 0 |
Column explanation:
bucketowner - Hashed canonical ID of the bucket owner's AWS account
bucket - Name of the S3 bucket
key - Name/path of the object in the bucket
object_size_bytes - Size of the object in bytes
object_size_gb - Size of the object in GB, rounded to 2 decimals
total_bytes_transferred - Total bytes sent for this object
total_gb_transferred - Total bytes converted to GB (bytes/1024³), rounded to 2 decimals
request_count - Number of times this object was requested
estimated_cost_usd - Estimated cost using $0.09 per GB transfer rate, rounded to 2 decimals
With this we can now track and identify the costs associated with data transfer from AWS S3 to the internet, broken down by individual files or objects.
DISCLAIMER
The method described above provides an estimation of data transfer costs based on standard AWS S3 pricing ($0.09 per GB). Please note that actual costs may vary depending on:
-
Your specific AWS S3 pricing tier
-
Any custom pricing agreements you may have with AWS
-
Data transfer destination (e.g., different regions or to CloudFront)
-
Volume discounts
-
Free tier eligibility
For the most accurate cost analysis, please refer to your AWS billing dashboard and consider consulting with AWS support or a cloud cost management specialist. This method should be used as a guide to identify potential high-cost objects rather than for precise financial reporting.
Relevant content
- asked 3 years ago
- asked 4 years ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 10 months ago
