DDoS Resilience - the unbelievable importance of HTTP caching

5 minute read
Content level: Advanced
0

Written primarily with CloudFront in mind, this article describes the importance of HTTP caching in preventing malicious requests from reaching your origin as well as some of the CloudFront features and Cache-Control response header values that can help you optimize cache-hit ratio

Note: This article represents my personal opinion and has not been vetted by AWS peers.

HTTP Caching with CloudFront

Amazon CloudFront's caching capabilities are a powerful tool for protecting your origin servers capacity. By implementing strategic caching policies, even with short time to live (TTLs), you can significantly reduce the load on your origin infrastructure. When content is served from CloudFront's cache hierarchy edge popregional edge caches (RECs) (with or without CloudFronts Origin Shield feature), requests may be cache populated from downstream peers in network, preventing volume from reaching your origin. For content that isn't yet cached or has expired, CloudFront's native request collapsing further protects origin's by consolidating multiple simultaneous requests into a single origin request. This combination of edge caching and request collapsing means that even during high-traffic events or volumetric request floods, only a fraction of requests need to reach your origin, making CDN an essential strategy for origin capacity preservation.

There's a common misconception among content owners that their content is not CDN-cacheable, when in fact it can be cached, or be made to be cacheable. Even when dealing with 'real' dynamic content delivered through asynchronous requests, the main page structure itself can and should be cached. Maximizing the cache hit ratio is crucial for performance and DDoS resilience - it can be the determining factor between whether an HTTP request flood attack impacts your service or not.

Page cacheability should be established as a core non-functional requirement during the initial planning stages of any website deployment.

Did you know? Statistically, the ‘/’ URI is the most commonly targetted URI by bad actors in HTTP request floods.

The following solutions may be considered for some common perceived “blockers” to caching static content:

  • For web pages requiring immediate updates when new content is published, content freshness can be managed through multiple strategies, leveraging HTTP standard cache controls as follows:
    • Use CloudFront cache invalidations which can be automatically triggered through content management system integrations during publish events.
    • Implement a split TTL approach - set 0 TTL for viewers via the Cache-Control ‘max-age’ directive. This means that clients will always check for content freshness on page re-load.
    • Set longer TTLs at the CDN level using the ‘s-maxage’ Cache-Control directive or non-zero ‘minimumTTL’ in the relevant distribution caching policy (the latter is preferred to enable request collapsing).
    • Set Etag/Date-Modified response headers to enable conditional requests which optimizes page load times and minimizes Data Transfer Out charges, by allowing HTTP 304 (Not Modified) responses with 0 response body, when content has not yet changed.
  • For static assets like images, scripts, and other page resources, implement unique file versioning with far-future client TTLs (1 year or more). Instead of updating files in place, publish new versions with unique URIs or query strings, and update their references in HTML. This strategy provides optimal delivery efficiency and maximum offload across browser caches, CDN, and origin services while ensuring immediate content updates through reference changes rather than cache invalidations.
  • For sites with mixed content, create multiple cache behaviors with appropriate cache policies to maximize DDoS resilience. While static content should be aggressively cached, truly dynamic endpoints that cannot be cached should be identified early and protected with additional controls such as scoped-down AWS WAF rate-based rules with lower limits than any ‘catch-all’ rate-based rules.
  • Resources with Cache-Control response headers like ‘no-cache, no-store’ can be cached on CDN by using a non-zero ‘minimumTTL’ in the relevant CloudFront cache behavior, without compromizing security
  • If you are currently serving cookies in responses to page requests, instead serve cookies from an asynchronous request
  • In case of a situation where the origin performance has been adversely impacted (for whatever reason), consider adding the ‘stale-while-revalidate’ Cache-Control directive, so that users no longer need to wait for responses from the origin. It’s ideal for content that refreshes frequently or unpredictably, or where content requires significant time to regenerate, and where having the latest version of the content is not essential.
  • The ‘stale-if-error’ directive enhances the user experience and improves availability by serving stale content when origins return an error.

Request collapsing

Request collapsing, which is when only a single request with a given cache key is sent to the origin from the same regional edge cache (REC), while other concurrent requests are paused until the outstanding response is received at the REC, reduces the number of requests sent to the origin from a given REC during a typical DDoS attack, to 1. This is important when a requested URI path is not in cache, or has expired.

The following configurations mean that request collapsing will not occur:

A request flood attack in conjunction with non-collapsed but otherwise cacheable requests, can be sufficient to overwhelm some origins, especially those that are unable to dynamically auto-scale.

Origin Shield

Enabling features like CloudFront Origin Shield can further help reduce the load on your origin by enabling a centralized caching layer instead of using distributed regional edge caches. This can be useful when you have a low-capacity origin without the option to dynamically scale.

EC2 or S3-based caching

There are some edge-cases where customers do not want to use a global CDN like CloudFront, in this case it’s still worth setting up a caching tier using S3, or dynamically-scaled EC2 instances to protect the content source.