Skip to content

How to Determine Data Transfer Costs from AWS S3 to the Internet by File/Object

0

Hello,

I'm seeking assistance in analyzing my AWS S3 data transfer costs. Specifically, I would like to understand how to track and identify the costs associated with data transfer from AWS S3 to the internet, broken down by individual files or objects. My goal is to determine which specific files are contributing the most to my data transfer expenses. ( maybe a ranking of objects-data-transfer-cost will be great )

Any guidance, tools, or scripts that you can provide will be greatly appreciated, as I aim to optimize my S3 usage and reduce unnecessary data transfer costs.

Thank you for your assistance.

1 Answer
0

To track and identify the costs associated with data transfer from AWS S3 to the internet we can enable S3 Server Access Logs and query the data through Athena.

Enable S3 Server Access Logging

1. Open the Amazon S3 console.
2. Select the source bucket for which you want to enable logging. 
3. Click on the "Properties" tab. 
4. Scroll to the "Server access logging" section and click "Edit".
5. Enable server access logging by checking the box. 
6. Choose a target bucket for the logs (create a new bucket if necessary). 
7. Optionally, specify a target prefix for log objects. 
8. Save changes.

Create an Athena Table for S3 Access Logs

1. Open the Amazon Athena console. 
2. Ensure you have a database selected or create a new one. 
3. Use the following SQL command to create a table for your S3 access logs:
CREATE EXTERNAL TABLE IF NOT EXISTS "s3_access_logs_your_bucket"  (
    bucketowner STRING,
    bucket STRING,
    requestdatetime STRING,
    remoteip STRING,
    requester STRING,
    requestid STRING,
    operation STRING,
    key STRING,
    request_uri STRING,
    httpstatus STRING,
    errorcode STRING,
    bytessent BIGINT,
    objectsize BIGINT,
    totaltime STRING,
    turnaroundtime STRING,
    referrer STRING,
    useragent STRING,
    versionid STRING,
    hostid STRING,
    sigv STRING,
    ciphersuite STRING,
    authtype STRING,
    endpoint STRING,
    tlsversion STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
    'serialization.format' = '1',
    'input.regex' = '([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\") (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\") ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$'
)
LOCATION 's3://your-bucket/';

Replace 's3://your-bucket/' with your actual log bucket and prefix.

Verify the Table:

Run a simple query to ensure the table is correctly set up:

SELECT * FROM s3_access_logs_your_bucket LIMIT 10;

Run query

Once table is verified we can query the table.

SELECT 
    bucketowner,
    bucket,
    key,
    MAX(objectsize) as object_size_bytes,
    ROUND(MAX(objectsize) / 1024.0 / 1024.0 / 1024.0, 2) as object_size_gb,
    SUM(bytessent) as total_bytes_transferred,
    ROUND(SUM(bytessent) / 1024.0 / 1024.0 / 1024.0, 2) as total_gb_transferred,
    COUNT(*) as request_count,
    -- Standard S3 data transfer pricing to internet
    ROUND((SUM(bytessent) / 1024.0 / 1024.0 / 1024.0) * 0.09, 2) as estimated_cost_usd
FROM s3_access_logs_test_logs_logs 
WHERE operation = 'REST.GET.OBJECT'
  AND requester != 'Amazon'
GROUP BY 
    bucketowner, 
    bucket, 
    key
ORDER BY estimated_cost_usd DESC
LIMIT 100;

** For this example all data is in standard storage.

Important Note on Query Results:

The output is sorted by the 'estimated_cost_usd' column in descending order, which means the objects are ranked from highest to lowest data transfer cost.

bucketownerbucketkeyobject_size_bytesobject_size_gbtotal_bytes_transferredtotal_gb_transferredrequest_countestimated_cost_usd
-**--************.mov16317284981.5281586424907.650.68
-**--************.rtf1046095920110
-**--************.csv87202092020

Column explanation:

bucketowner -  Hashed canonical ID of the bucket owner's AWS account

bucket - Name of the S3 bucket

key - Name/path of the object in the bucket

object_size_bytes - Size of the object in bytes

object_size_gb - Size of the object in GB, rounded to 2 decimals

total_bytes_transferred - Total bytes sent for this object

total_gb_transferred - Total bytes converted to GB (bytes/1024³), rounded to 2 decimals

request_count - Number of times this object was requested

estimated_cost_usd - Estimated cost using $0.09 per GB transfer rate, rounded to 2 decimals

With this we can now track and identify the costs associated with data transfer from AWS S3 to the internet, broken down by individual files or objects.

DISCLAIMER
The method described above provides an estimation of data transfer costs based on standard AWS S3 pricing ($0.09 per GB). Please note that actual costs may vary depending on:

  1. Your specific AWS S3 pricing tier

  2. Any custom pricing agreements you may have with AWS

  3. Data transfer destination (e.g., different regions or to CloudFront)

  4. Volume discounts

  5. Free tier eligibility

For the most accurate cost analysis, please refer to your AWS billing dashboard and consider consulting with AWS support or a cloud cost management specialist. This method should be used as a guide to identify potential high-cost objects rather than for precise financial reporting.

AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.