Skip to content

How do I use Athena to analyze my Application Load Balancer access logs?

3 minute read
3

I want to use Amazon Athena to analyze my Application Load Balancer access logs.

Short description

By default, Elastic Load Balancing (ELB) doesn't activate access logs. When you activate access logs, you must specify an Amazon Simple Storage Service (Amazon S3) bucket. Athena analyzes Application Load Balancer and Classic Load Balancer access logs and stores the logs in the Amazon S3 bucket.

Resolution

Create a database and table for Application Load Balancer logs

Complete the following steps:

  1. Open the Athena console.

  2. To create a database in the same AWS Region as the S3 bucket, run the following command in the query editor:

    CREATE DATABASE alb_db
  3. To create a table in the database, use one of the following options:
    Use partition projection to create an alb_logs table for the Application Load Balancer access logs.
    -or-
    To create a table with partitions that Athena stores in the AWS Glue Data Catalog, run the following query:

    CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs_partitioned(type string,
        time string,
        elb string,
        client_ip string,
        client_port int,
        target_ip string,
        target_port int,
        request_processing_time double,
        target_processing_time double,
        response_processing_time double,
        elb_status_code int,
        target_status_code string,
        received_bytes bigint,
        sent_bytes bigint,
        request_verb string,
        request_url string,
        request_proto string,
        user_agent string,
        ssl_cipher string,
        ssl_protocol string,
        target_group_arn string,
        trace_id string,
        domain_name string,
        chosen_cert_arn string,
        matched_rule_priority string,
        request_creation_time string,
        actions_executed string,
        redirect_url string,
        lambda_error_reason string,
        target_port_list string,
        target_status_code_list string,
        classification string,
        classification_reason string,
        conn_trace_id string
    )
    PARTITIONED BY(day string)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    WITH SERDEPROPERTIES(
        'serialization.format' = '1',
        'input.regex' ='([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\" ?([^ ]*)?( .*)?')
    LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/ACCOUNT-ID/elasticloadbalancing/REGION/'

    Note: Replace alb_logs_partitioned with your table name. Also, replace amzn-s3-demo-bucket with your bucket name, ACCOUNT-ID with your AWS account ID, and REGION with your Region.

  4. (Optional) If your table uses partition projection, then proceed to step 5. Don't use an AWS Glue crawler on the Application Load Balancer logs. To load the partitions, run one of the following data definition language (DDL) statements:
    For Hive style partitions, run the following MSCK REPAIR TABLE DDL statement:

    MSCK REPAIR TABLE table_name

    Note: Replace table_name with your table name.
    If you don't have Hive style partitions, then run the following ALTER TABLE ADD PARTITION DDL statement:

    ALTER TABLE alb_logs_partitioned ADD PARTITION (day = '2022/05/21') LOCATION's3://amzn-s3-demo-bucket/AWSLogs/ACCOUNT-ID/elasticloadbalancing/REGION/2022/05/21/'

    Note: Replace alb_logs_partitioned with your table name. Also, replace amzn-s3-demo-bucket with your bucket name, ACCOUNT-ID with your account ID, and REGION with your Region.

  5. In the navigation pane, expand Tables, and then choose Preview table.

  6. In the Results window, view the data from the Application Load Balancer access logs.
    Note: To run SQL statements on the table, use the Query editor.

Related information

Access log entries

Query Application Load Balancer logs

AWS OFFICIALUpdated 8 months ago
13 Comments

This is great but requires some corrections:

For example, SELECT * from alb_log where elb_status_code = '503'; will fail with "TYPE_MISMATCH: line 1:46: Cannot apply operator: integer = varchar(3)" error:

and we should use the following instead: SELECT * from alb_log where elb_status_code = 503;

replied 3 years ago

Also this query:

SELECT COUNT(request_verb) AS count, request_verb, client_ip FROM alb_log_partition_projection WHERE day = '2022/05/21' GROUP BY request_verb, client_ip;

Needs to be changed to:

SELECT COUNT(request_verb) AS count, request_verb, client_ip FROM alb_log WHERE day = '2022/05/21' GROUP BY request_verb, client_ip;

Not a major issue its just the table name according to your tutorial.

replied 3 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
EXPERT
replied 3 years ago

Create table queries missing REGION from the S3 path

replied 3 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
MODERATOR
replied 3 years ago

The region is still missing from the S3 paths in the first CREATE TABLE query. Without this, the table doesn't work and returns no results.

replied 3 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
EXPERT
replied 3 years ago

The first create table in step 3 has an extra single quote in the LOCATION line:

LOCATION ''s3://<your-al...

replied 2 years ago

This example in the article:

SELECT target_ip, (Count(target_ip)* 100.0 / (Select Count(*) From alb_logs)) as backend_traffic_percentage FROM alb_logs GROUP by target_ip< ORDER By count() DESC;

Give this error when running:

error - line 5:1: mismatched input '<'. Expecting: <predicate>

I was able to resolve it by removing < after target_ip on line 4.

SELECT target_ip, (Count(target_ip)* 100.0 / (Select Count(*) From alb_logs)) as backend_traffic_percentage FROM alb_logs GROUP by target_ip ORDER By count() DESC;

AWS
replied 2 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
EXPERT
replied 2 years ago

Is there something that needs to be done to load the log files into athena? I followed this article step by step. I have the table created but I do not see the logs. I have logs in S3. What is the secret?

replied 2 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
EXPERT
replied 2 years ago
replied 2 years ago