- Newest
- Most votes
- Most comments
Hello,
Please try this solution.
Step 1 Ensure CUR Includes Necessary Columns
your CUR includes the following columns:
"line_item_usage_account_id" "line_item_resource_id" "line_item_usage_type" "line_item_usage_amount" "line_item_product_code"
Step 2 Download CUR File
Download the CUR file from your S3 bucket to your local machine or directly to your processing environment.
Step 3 Install Pandas
If you don't have Pandas installed, you can install it using pip:
pip install pandas
Step 4 Simple Python Script
Here's a Python script to parse the CUR file and calculate the total bytes transferred for each CloudFront distribution:
import pandas as pd
# Load the CUR file into a DataFrame
cur_file_path = 'path_to_your_cur_file.csv' # Update this path
df = pd.read_csv(cur_file_path)
# Filter rows relevant to CloudFront data transfer
cloudfront_df = df[
(df['line_item_product_code'] == 'AmazonCloudFront') &
(df['line_item_usage_type'].str.contains('DataTransfer', na=False))
]
# Aggregate the bytes transferred per distribution
bytes_transferred = cloudfront_df.groupby('line_item_resource_id')['line_item_usage_amount'].sum().reset_index()
# Rename columns for better readability
bytes_transferred.columns = ['Distribution ID', 'Total Bytes Transferred']
# Save the result to a CSV file
output_file_path = 'cloudfront_bytes_transferred.csv'
bytes_transferred.to_csv(output_file_path, index=False)
print(f"Total bytes transferred per distribution have been saved to {output_file_path}")
Steps to Run the Script:
Download your CUR file from your S3 bucket and note the path where it's saved.
Update the script:
- Replace 'path_to_your_cur_file.csv' with the actual path to your CUR file.
- (Optional) Change the output_file_path if you want the results saved to a different location or with a different name.
Run the script:
Save the script to a file, for example, cloudfront_usage.py.
Execute the script using Python:
python cloudfront_usage.py
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/reports-billing.html
https://docs.aws.amazon.com/cur/latest/userguide/use-cases.html
https://docs.aws.amazon.com/cur/latest/userguide/cur-data-transfers-charges.html
You will need an export containing the following columns from the CUR data:
line_item_usage_start_date
line_item_resource_id
line_item_usage_amount
line_item_unblended_cost
line_item_net_unblended_cost **<- This will only be in your data if you are getting a discount on the OnDemand pricing **
line_item_usage_type
line_item_line_item_type
product_region
You then can create the following query:
SELECT
month as usage_month,
line_item_resource_id,
data_transfer_out_gb,
ondemand_cost,
discounted_cost, **<- Remove line_item_net_unblended_costdoes no exist**
(1 - discounted_cost/ondemand_cost) as discount_percent, **<- Remove line_item_net_unblended_costdoes no exist**
operation,
product_region
FROM
(SELECT
date(date_trunc('month',line_item_usage_start_date)) AS month,
(CASE
WHEN line_item_usage_type LIKE '%DataTransfer-Out-Bytes' THEN 'DataTransfer Out'
WHEN line_item_usage_type LIKE '%-Requests-%' AND line_item_usage_type NOT LIKE '%-OriginShield' THEN 'http/https (proxy) requests'
ELSE 'Other CloudFront operations'
END) as operation,
product_region,
line_item_resource_id,
sum(line_item_usage_amount) as data_transfer_out_gb,
sum(line_item_unblended_cost) as ondemand_cost,
sum(line_item_net_unblended_cost) as discounted_cost **<- Remove if this column does no exist**
FROM **<your table>**
WHERE line_item_product_code = 'AmazonCloudFront'
AND line_item_line_item_type = 'Usage'
AND line_item_usage_start_date >= timestamp '2024-01-01 00:00:00.000' **<- Modify the date accordingly to get the right start date**
AND line_item_usage_start_date < timestamp '2024-04-01 00:00:00.000' **<- Modify to set end date. Example dates get data for Jan/Feb/Mar 2024**
GROUP BY 1,2,3,4)
This will give you a table that will show you for each month and cloud-front region how much DataTransfer Out you had (GB) and how much that did cost AND also how many http/https (proxy) requests you had.
For each combination of mont/region/distribution you will get up to 3 lines of data.
If you also want to break out the additional features (origin shield and edge functions) the CASE clause needs to be expanded to break them out of the Other CloudFront operations category
The easiest way to use this Query is if you setup a https://docs.aws.amazon.com/cur/latest/userguide/dataexports-create-legacy.html report (enable Include resource IDs but NOT the other two), make sure to select Athena as a target (so it will select parquet format). Then setup Glue/Athena to query it from the console: https://docs.aws.amazon.com/cur/latest/userguide/cur-query-athena.html
WHEN line_item_usage_type LIKE '%-Requests-%' AND line_item_usage_type NOT LIKE '%-OriginShield' THEN 'http/https (proxy) requests' ELSE 'Other CloudFront operations'
Hi, I guess, these operations don't count/contribute to total bytes transferred? right? Only the one's having "DataTransfer-Out-Bytes"
They don't contribute, but they can be a cost driver if you have a very busy site and it may be benficial to break them out for visibility.
Relevant content
- asked 2 months ago
- asked a year ago
- asked 8 months ago
- asked 5 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated a year ago
Thank you, Partha. It worked - and thanks for pointing to the right column that has this information. The Online documentation isn't very clear about this though.