Skip to content

High CloudFront data transfer from US (Ohio) with low request count

0

Hi everyone,

I recently created a CloudFront distribution connected to an EC2 origin (pay-as-you-go setup), as we manage around 15 apex domains.

In the first few weeks of usage, I noticed unusually high outbound traffic from a single location: United States – Ohio.

Specifically, between April 1st and April 17th:

Ohio: ~315,000 requests – 2,068 GB transferred Michigan: ~2,110,000 requests – 145 GB transferred

So the number of requests is relatively low, but the amount of data transferred is very high from Ohio.

Before introducing CloudFront (when traffic was served directly from EC2), we never observed such high data transfer volumes.

asked 23 days ago88 views
6 Answers
5

To answer your recent questions/comment and help you pinpoint the source of those 2 TB in Ohio:

  • Logs: You do not need WAF logs for this. Simply enable CloudFront Standard Logs (Access Logs). They are free to generate (you only pay for S3 storage) and include the Client IP, User-Agent, and the specific URI being requested. This will show you exactly who (or what) is pulling the data.
  • The "6.5 MB" Math: Your math is key here. If your files are < 1 MB but Ohio averages 6.5 MB per request, a client is likely requesting the same resources repeatedly or your dynamic HTML is being bloated/looped.
  • Cost Misconception: You are correct that Data Transfer from EC2 to CloudFront is free ($0.00). However, you do pay CloudFront for the Data Transfer Out (DTO) from the Ohio edge location to the internet. Those 2 TB are hitting your bill regardless of whether they are cached or not.

I think the following can help here:

  1. Enable Standard Logs and wait an hour.

  2. Use Amazon Athena or even a simple text editor to look for the Top IPs in the Ohio region (Edge location code: OHI).

  3. Check for "Range" headers: Sometimes bots use these to bypass standard caching or scrape content in chunks.

  4. Once you have the IP, use your existing basic WAF to create a "Block" rule for that specific IP address to stop the bleed immediately.

EXPERT
answered 22 days ago
EXPERT
reviewed 20 days ago
4

While the AI suggestions are technically correct, check these two practical possibilities:

1. Large File Crawlers/Bots: A specific IP or bot (possibly an AWS-based service in us-east-2) might be pulling large assets or backups via your apex domains. Check your CloudFront Access Logs and filter by "Edge Location: OHI" to identify the specific IP and User-Agent.

2. Cache Misses on Large Objects: If your Cache Hit Ratio is low in Ohio, every request fetches the full 6 GB+ from your EC2 origin. Ensure headers like Authorization or Set-Cookie aren't preventing caching for these large files.

So, try to identify the top IPs in your CloudFront Usage Reports or Logs for the Ohio region to see who exactly is pulling those 2 TB

See also: https://aws.amazon.com/blogs/networking-and-content-delivery/cost-optimizing-your-aws-architectures-by-utilizing-amazon-cloudfront-features/

EXPERT
answered 23 days ago
EXPERT
reviewed 20 days ago
2

The unusually high data transfer from Ohio with relatively low request count suggests that CloudFront is serving objects with very large file sizes from that region. Based on how CloudFront categorizes data transfer, this pattern could indicate a few scenarios:

The high GB-to-request ratio (about 6.5 GB per request from Ohio vs 0.07 GB per request from Michigan) means individual requests from Ohio are transferring significantly larger objects. This could be due to:

  1. Large static or dynamic content: CloudFront categorizes transferred data by object TTL (Time to Live). Objects with TTL ≥ 3600 seconds are classified as "static," while objects with TTL < 3600 seconds are "dynamic." If you're serving large files like videos, software downloads, or large media assets from Ohio, this would explain the high transfer volume.

  2. Proxy requests: CloudFront also tracks data returned in response to DELETE, OPTIONS, PATCH, POST, and PUT requests separately as "proxy" traffic. If your application handles large file uploads or POST requests with substantial payloads from the Ohio region, this could contribute to the high data transfer.

  3. Caching behavior: Since CloudFront edge locations cache content, it's possible that the Ohio edge location is serving fewer but much larger objects that aren't being cached as effectively, requiring more data to be transferred from your origin.

I'd recommend reviewing your CloudFront usage reports to break down the data transfer by type (static, dynamic, or proxy) and protocol (HTTP vs HTTPS) for the Ohio region. This will help identify whether specific types of content or request methods are driving the high transfer volumes.
Sources
View CloudFront usage reports - Amazon CloudFront

answered 23 days ago
EXPERT
reviewed 23 days ago
2

The pattern you are seeing, low request count but very high data transfer from Ohio, is a strong indicator of a few specific issues worth investigating.

Why Ohio specifically: AWS CloudFront edge nodes in Ohio route traffic from that region, but Ohio is also where a large amount of automated and bot traffic originates due to the concentration of AWS and data centre infrastructure there. High data transfer with low request count suggests large response payloads per request, not a high volume of users.

Most likely causes: Large uncached responses being fetched from your EC2 origin. If CloudFront is not caching responses correctly, every request triggers a full origin fetch. Check your cache hit ratio in CloudFront metrics, if it is low, your cache behaviour settings need adjusting.

Automated traffic or crawlers. Bots tend to request large assets repeatedly. Check your access logs for repeated requests to the same URLs, unusual user agents, or requests with no referrer. No compression enabled. If gzip or Brotli compression is not configured, responses are transferred at full size. Enable compression in your CloudFront distribution settings.

What to do next: Enable CloudFront access logging and inspect the Ohio traffic for patterns Check your cache hit ratio under CloudFront metrics in the console.

Set up AWS WAF with a bot control rule to block automated traffic Review your cache behaviour TTL settings to ensure responses are being cached

Reference: CloudFront cache hit ratio optimisation: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/cache-hit-ratio.html

AWS WAF bot control: https://docs.aws.amazon.com/waf/latest/developerguide/aws-managed-rules-bot-control-rule-group.html

CloudFront access logs: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html

answered 21 days ago
EXPERT
reviewed 21 days ago
0

Hi Florian, Thank you for your reply.

Regarding the domains, I have a website with small static resources (images/CSS/JS) that practically never exceed 1 MB each.

The cache hit ratio is good, not so much in terms of request count but in terms of bytes transferred from the EC2 origin, also considering that the base page document (HTML) is never cached since it’s dynamic content.

Can you confirm that to see statistics on IP or User Agent I need to enable AWS WAF logs? At the moment I only have the basic WAF (Core protections) enabled, without DDoS and Bot protection.

One more confirmation: the cache hit ratio is only used to evaluate how much traffic is transferred from EC2 to CloudFront edge locations, right? And since this traffic from EC2 to any edge location is free, it doesn’t impact costs.

answered 23 days ago
EXPERT
reviewed 22 days ago
0

Hi,

Following further investigation, we were able to identify the root cause of the high traffic.

By analyzing CloudFront access logs using Athena, we found that several AI/crawler bots were repeatedly requesting large resources, in particular XML sitemaps and PDF catalog files. Although the number of requests was relatively low, the size of these files resulted in a high volume of data transfer (BytesOut), which appeared concentrated in a single location.

To mitigate the issue, we implemented an AWS WAF rule that blocks these bots based on User-Agent patterns. After applying this rule, the traffic spike has been resolved and returned to expected levels.

Thank you for your support.

answered 16 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.