Saltar al contenido

How can I analyze WAF logs to determine a baseline threshold when using rate-based rules

4 minutos de lectura
Nivel de contenido: Intermedio
0

I want to monitor how many requests originate from client IP addresses so I can set a realistic rate limit value.

Background: A rate-based rule counts incoming requests and rate limits requests when they are coming at too fast a rate. The rule aggregates requests according to your criteria, and counts and rate limits the aggregate groupings, based on the rule's evaluation window, request limit, and action settings. As of when this article was put together, the amount of time used for request counts (evaluation window) available are 1 minute, 2 minutes, 5 minutes and 10 minutes. Prior to configuring a rate-based rule, customers want to have an idea of the number of aggregate requests coming from a client, which they can then use as a baseline for setting a rate-limit value.

Resolution: Customers can analyze existing WAF logs over a period of time to provide an estimate of the current request rate.

Pre-requisites:

  • Logging is enabled on the WebACL. Logging destination is set to a CloudWatch Logs log group or an S3 bucket. For more information on how to setup WAF logging, please see this guide - https://docs.aws.amazon.com/waf/latest/developerguide/logging-destinations.html
    • For CloudWatch Logs log group, Logs Insights will be used to query the logs.
    • For S3 bucket, Athena will be used to query the logs.

Query for CloudWatch Logs log group:

  1. In CloudWatch Log Insights, selected the log group used for WAF “aws-waf-logs-”
  2. Run the query below
fields httpRequest.clientIp as ClientIP
| sort @timestamp desc
| stats count(*) as requestCount by bin(5m), ClientIP
| limit 100

fields httpRequest.clientIp as ClientIP

  • This line selects the client IP address from the httpRequest field in the log data.
  • It renames this field as "ClientIP" for easier reference in the output.

| limit 100

  • This limits the result set to the 100 most recent log entries after sorting.

| stats count(*) as requestCount by bin(5m)

  • count(*) counts all log entries.
  • as requestCount names this count "requestCount".
  • by bin(5m) groups these counts into 5-minute time intervals.

Query for S3 bucket using Athena:

  1. In Athena, create a Database and a table for the WAF logs.
  2. To create a database, please follow the instructions in this guide - https://docs.aws.amazon.com/athena/latest/ug/step-1-create-a-database.html
  3. To create a table, copy and paste the following DDL statement into the Athena console, and modify values in LOCATION to use the URI of the S3 bucket used for storing the WAF logs.
CREATE EXTERNAL TABLE `waf_logs`
  `timestamp` bigint,
  `formatversion` int,
  `webaclid` string, 
  `terminatingruleid` string, 
  `terminatingruletype` string, 
  `action` string, 
  `httpsourcename` string, 
  `httpsourceid` string, 
  `rulegrouplist` array<string>, 
  `ratebasedrulelist` array<struct<ratebasedruleid:string,limitkey:string,maxrateallowed:int>>, 
  `nonterminatingmatchingrules` array<struct<ruleid:string,action:string>>, 
  `httprequest` struct<clientip:string,country:string,headers:array<struct<name:string,value:string>>,uri:string,args:string,httpversion:string,httpmethod:string,requestid:string>
 )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ('paths'='action,formatVersion,httpRequest,httpSourceId,httpSourceName,nonTerminatingMatchingRules,rateBasedRuleList,ruleGroupList,terminatingRuleId,terminatingRuleType,timestamp,webaclId')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://your-waf-logs-directory/<WebACL>/'
  1. Run the query in the Athena console. After the query completes, you can then query the database using the statement below.
WITH test_dataset AS (
  SELECT 
    format_datetime(from_unixtime((timestamp/1000) - ((minute(from_unixtime(timestamp / 1000))%5) * 60)),'yyyy-MM-dd HH:mm') AS five_minutes_ts,
    httprequest.clientip 
  FROM waf_logs 
  WHERE from_unixtime(timestamp / 1000) >= date '2025-08-20' 
    AND from_unixtime(timestamp / 1000) < date '2025-08-21'
)
SELECT five_minutes_ts, clientip, count(*) AS ip_count 
FROM test_dataset 
GROUP BY five_minutes_ts, clientip;

Notes Both CloudWatch and Athena queries used above were used to query logs where the evaluation window is 5 mins. If you would like to use other evaluation window options, make sure to modify the appropriate fields in the queries.

CloudWatch - | stats count(*) as requestCount by bin(5m) Athena - Replace five_minutes_ts with the time group you want to use.

AWS
EXPERTO
publicado hace 3 meses188 visualizaciones