I want to monitor how many requests originate from client IP addresses so I can set a realistic rate limit value.
Background:
A rate-based rule counts incoming requests and rate limits requests when they are coming at too fast a rate. The rule aggregates requests according to your criteria, and counts and rate limits the aggregate groupings, based on the rule's evaluation window, request limit, and action settings. As of when this article was put together, the amount of time used for request counts (evaluation window) available are 1 minute, 2 minutes, 5 minutes and 10 minutes. Prior to configuring a rate-based rule, customers want to have an idea of the number of aggregate requests coming from a client, which they can then use as a baseline for setting a rate-limit value.
Resolution:
Customers can analyze existing WAF logs over a period of time to provide an estimate of the current request rate.
Pre-requisites:
Query for CloudWatch Logs log group:
- In CloudWatch Log Insights, selected the log group used for WAF “aws-waf-logs-”
- Run the query below
fields httpRequest.clientIp as ClientIP
| sort @timestamp desc
| stats count(*) as requestCount by bin(5m), ClientIP
| limit 100
fields httpRequest.clientIp as ClientIP
- This line selects the client IP address from the httpRequest field in the log data.
- It renames this field as "ClientIP" for easier reference in the output.
| limit 100
- This limits the result set to the 100 most recent log entries after sorting.
| stats count(*) as requestCount by bin(5m)
- count(*) counts all log entries.
- as requestCount names this count "requestCount".
- by bin(5m) groups these counts into 5-minute time intervals.
Query for S3 bucket using Athena:
- In Athena, create a Database and a table for the WAF logs.
- To create a database, please follow the instructions in this guide - https://docs.aws.amazon.com/athena/latest/ug/step-1-create-a-database.html
- To create a table, copy and paste the following DDL statement into the Athena console, and modify values in LOCATION to use the URI of the S3 bucket used for storing the WAF logs.
CREATE EXTERNAL TABLE `waf_logs`
`timestamp` bigint,
`formatversion` int,
`webaclid` string,
`terminatingruleid` string,
`terminatingruletype` string,
`action` string,
`httpsourcename` string,
`httpsourceid` string,
`rulegrouplist` array<string>,
`ratebasedrulelist` array<struct<ratebasedruleid:string,limitkey:string,maxrateallowed:int>>,
`nonterminatingmatchingrules` array<struct<ruleid:string,action:string>>,
`httprequest` struct<clientip:string,country:string,headers:array<struct<name:string,value:string>>,uri:string,args:string,httpversion:string,httpmethod:string,requestid:string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('paths'='action,formatVersion,httpRequest,httpSourceId,httpSourceName,nonTerminatingMatchingRules,rateBasedRuleList,ruleGroupList,terminatingRuleId,terminatingRuleType,timestamp,webaclId')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://your-waf-logs-directory/<WebACL>/'
- Run the query in the Athena console. After the query completes, you can then query the database using the statement below.
WITH test_dataset AS (
SELECT
format_datetime(from_unixtime((timestamp/1000) - ((minute(from_unixtime(timestamp / 1000))%5) * 60)),'yyyy-MM-dd HH:mm') AS five_minutes_ts,
httprequest.clientip
FROM waf_logs
WHERE from_unixtime(timestamp / 1000) >= date '2025-08-20'
AND from_unixtime(timestamp / 1000) < date '2025-08-21'
)
SELECT five_minutes_ts, clientip, count(*) AS ip_count
FROM test_dataset
GROUP BY five_minutes_ts, clientip;
Notes
Both CloudWatch and Athena queries used above were used to query logs where the evaluation window is 5 mins. If you would like to use other evaluation window options, make sure to modify the appropriate fields in the queries.
CloudWatch - | stats count(*) as requestCount by bin(5m)
Athena - Replace five_minutes_ts with the time group you want to use.