How do I analyze AWS WAF logs in Amazon Athena?

5 minute read
0

I want to query AWS WAF logs in Amazon Athena.

Resolution

To use Athena with Amazon Simple Storage Service (Amazon S3) to query AWS WAF logs, complete the following steps:

  1. Turn on AWS WAF web ACL traffic logging for your Amazon S3 bucket. Note the values for Target bucket and Target prefix to specify the Amazon S3 location in an Athena query.

  2. Open the Amazon Athena console.
    Note: Before you run your first query, you might need to set up a query result location in Amazon S3

  3. In the Query editor, run the DDL statement CREATE DATABASE to create a database:

    CREATE DATABASE waf_logs_db

    Note: It's a best practice to create the database in the same AWS Region as your S3 bucket.

  4. Create a table schema for the AWS WAF logs in Athena.
    Example table template query with partition projection:

    CREATE EXTERNAL TABLE `waf_logs`(
      `timestamp` bigint,
      `formatversion` int,
      `webaclid` string,
      `terminatingruleid` string,
      `terminatingruletype` string,
      `action` string,
      `terminatingrulematchdetails` array <
                                        struct <
                                            conditiontype: string,
                                            sensitivitylevel: string,
                                            location: string,
                                            matcheddata: array < string >
                                              >
                                         >,
      `httpsourcename` string,
      `httpsourceid` string,
      `rulegrouplist` array <
                          struct <
                              rulegroupid: string,
                              terminatingrule: struct <
                                                  ruleid: string,
                                                  action: string,
                                                  rulematchdetails: array <
                                                                       struct <
                                                                           conditiontype: string,
                                                                           sensitivitylevel: string,
                                                                           location: string,
                                                                           matcheddata: array < string >
                                                                              >
                                                                        >
                                                    >,
                              nonterminatingmatchingrules: array <
                                                                  struct <
                                                                      ruleid: string,
                                                                      action: string,
                                                                      overriddenaction: string,
                                                                      rulematchdetails: array <
                                                                                           struct <
                                                                                               conditiontype: string,
                                                                                               sensitivitylevel: string,
                                                                                               location: string,
                                                                                               matcheddata: array < string >
                                                                                                  >
                                                                                           >
                                                                        >
                                                                 >,
                              excludedrules: string
                                >
                           >,
    `ratebasedrulelist` array <
                             struct <
                                 ratebasedruleid: string,
                                 limitkey: string,
                                 maxrateallowed: int
                                   >
                              >,
      `nonterminatingmatchingrules` array <
                                        struct <
                                            ruleid: string,
                                            action: string,
                                            rulematchdetails: array <
                                                                 struct <
                                                                     conditiontype: string,
                                                                     sensitivitylevel: string,
                                                                     location: string,
                                                                     matcheddata: array < string >
                                                                        >
                                                                 >,
                                            captcharesponse: struct <
                                                                responsecode: string,
                                                                solvetimestamp: string
                                                                 >
                                              >
                                         >,
      `requestheadersinserted` array <
                                    struct <
                                        name: string,
                                        value: string
                                          >
                                     >,
      `responsecodesent` string,
      `httprequest` struct <
                        clientip: string,
                        country: string,
                        headers: array <
                                    struct <
                                        name: string,
                                        value: string
                                          >
                                     >,
                        uri: string,
                        args: string,
                        httpversion: string,
                        httpmethod: string,
                        requestid: string
                          >,
      `labels` array <
                   struct <
                       name: string
                         >
                    >,
      `captcharesponse` struct <
                            responsecode: string,
                            solvetimestamp: string,
                            failureReason: string
                              >,
      `challengeresponse` struct <
                            responsecode: string,
                            solvetimestamp: string,
                            failureReason: string
                            >,
      `ja3Fingerprint` string
    )
    PARTITIONED BY ( 
    `region` string, 
    `date` string) 
    ROW FORMAT SERDE 
      'org.openx.data.jsonserde.JsonSerDe' 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      's3://DOC-EXAMPLE-BUCKET/AWSLogs/accountID/WAFLogs/region/DOC-EXAMPLE-WEBACL/'
    TBLPROPERTIES(
     'projection.enabled' = 'true',
     'projection.region.type' = 'enum',
     'projection.region.values' = 'us-east-1,us-west-2,eu-central-1,eu-west-1',
     'projection.date.type' = 'date',
     'projection.date.range' = '2021/01/01,NOW',
     'projection.date.format' = 'yyyy/MM/dd',
     'projection.date.interval' = '1',
     'projection.date.interval.unit' = 'DAYS',
     'storage.location.template' = 's3://DOC-EXAMPLE-BUCKET/AWSLogs/accountID/WAFLogs/${region}/DOC-EXAMPLE-WEBACL/${date}/')

    Note: Replace storage.location.template, projection.region.values, projection.date.range, DOC-EXAMPLE-BUCKET, and DOC-EXAMPLE-WEBACL with your variables.

    Example table template query without partition projection:

    CREATE EXTERNAL TABLE `waf_logs`(
      `timestamp` bigint,
      `formatversion` int,
      `webaclid` string,
      `terminatingruleid` string,
      `terminatingruletype` string,
      `action` string,
      `terminatingrulematchdetails` array <
                                        struct <
                                            conditiontype: string,
                                            sensitivitylevel: string,
                                            location: string,
                                            matcheddata: array < string >
                                              >
                                         >,
      `httpsourcename` string,
      `httpsourceid` string,
      `rulegrouplist` array <
                          struct <
                              rulegroupid: string,
                              terminatingrule: struct <
                                                  ruleid: string,
                                                  action: string,
                                                  rulematchdetails: array <
                                                                       struct <
                                                                           conditiontype: string,
                                                                           sensitivitylevel: string,
                                                                           location: string,
                                                                           matcheddata: array < string >
                                                                              >
                                                                        >
                                                    >,
                              nonterminatingmatchingrules: array <
                                                                  struct <
                                                                      ruleid: string,
                                                                      action: string,
                                                                      overriddenaction: string,
                                                                      rulematchdetails: array <
                                                                                           struct <
                                                                                               conditiontype: string,
                                                                                               sensitivitylevel: string,
                                                                                               location: string,
                                                                                               matcheddata: array < string >
                                                                                                  >
                                                                                           >
                                                                        >
                                                                 >,
                              excludedrules: string
                                >
                           >,
    `ratebasedrulelist` array <
                             struct <
                                 ratebasedruleid: string,
                                 limitkey: string,
                                 maxrateallowed: int
                                   >
                              >,
      `nonterminatingmatchingrules` array <
                                        struct <
                                            ruleid: string,
                                            action: string,
                                            rulematchdetails: array <
                                                                 struct <
                                                                     conditiontype: string,
                                                                     sensitivitylevel: string,
                                                                     location: string,
                                                                     matcheddata: array < string >
                                                                        >
                                                                 >,
                                            captcharesponse: struct <
                                                                responsecode: string,
                                                                solvetimestamp: string
                                                                 >
                                              >
                                         >,
      `requestheadersinserted` array <
                                    struct <
                                        name: string,
                                        value: string
                                          >
                                     >,
      `responsecodesent` string,
      `httprequest` struct <
                        clientip: string,
                        country: string,
                        headers: array <
                                    struct <
                                        name: string,
                                        value: string
                                          >
                                     >,
                        uri: string,
                        args: string,
                        httpversion: string,
                        httpmethod: string,
                        requestid: string
                          >,
      `labels` array <
                   struct <
                       name: string
                         >
                    >,
      `captcharesponse` struct <
                            responsecode: string,
                            solvetimestamp: string,
                            failureReason: string
                              >,
      `challengeresponse` struct <
                              responsecode: string,
                              solvetimestamp: string,
                              failureReason: string
                              >,
      `ja3Fingerprint` string
    )
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION 's3://DOC-EXAMPLE-BUCKET/prefix/'

    Note: Replace DOC-EXAMPLE-BUCKET with your variable.

  5. In the navigation pane, under Tables, choose Preview table. If you see data from the AWS WAF in the Results window, then you successfully created the Athena table. The data are values such as, formatversion, webaclid, httpsourcename, and ja3Fingerprint).

Example queries

Use the following example queries to analyze your AWS WAF log files. You can also create your own queries for your use case.

Count the matched IP addresses that aligned with excluded rules in the last 10 days

WITH test_dataset AS 
  (SELECT * FROM waf_logs 
    CROSS JOIN UNNEST(rulegrouplist) AS t(allrulegroups))
SELECT 
  COUNT(*) AS count, 
  "httprequest"."clientip", 
  "allrulegroups"."excludedrules",
  "allrulegroups"."ruleGroupId"
FROM test_dataset 
WHERE allrulegroups.excludedrules IS NOT NULL AND from_unixtime(timestamp/1000) > now() - interval '10' day
GROUP BY "httprequest"."clientip", "allrulegroups"."ruleGroupId", "allrulegroups"."excludedrules"
ORDER BY count DESC

Return records for a specified date range and IP address

SELECT * 
FROM waf_logs 
WHERE httprequest.clientip='53.21.198.66' AND "date" >= '2022/03/01' AND "date" < '2022/03/31'

Count the number of times a request was blocked, grouped by specific attributes

SELECT 
  COUNT(*) AS count,
  webaclid,
  terminatingruleid,
  httprequest.clientip,
  httprequest.uri
FROM waf_logs
WHERE action='BLOCK'
GROUP BY webaclid, terminatingruleid, httprequest.clientip, httprequest.uri
ORDER BY count DESC
LIMIT 100;

Group all counted custom rules by number of times matched

SELECT
  count(*) AS count,
         httpsourceid,
         httprequest.clientip,
         t.ruleid,
         t.action
FROM "waf_logs" 
CROSS JOIN UNNEST(nonterminatingmatchingrules) AS t(t) 
WHERE action <> 'BLOCK' AND cardinality(nonTerminatingMatchingRules) > 0 
GROUP BY t.ruleid, t.action, httpsourceid, httprequest.clientip 
ORDER BY "count" DESC
Limit 50

For more information, see Querying AWS WAF logs.

Related information

Troubleshooting in Athena

How do I analyze my Amazon S3 server access logs using Athena?

AWS OFFICIAL
AWS OFFICIALUpdated 3 months ago