I want to use Amazon Athena to query my Application Load Balancer connection logs.
Resolution
To query Application Load Balancer connection logs, create a database and table for Athena. Complete the following steps:
-
Open the Athena console.
-
To create a database, run the following command in the Query Editor:
CREATE DATABASE alb_db
Note: It's a best practice to create the database in the same AWS Region as your Amazon Simple Storage Service (Amazon S3) bucket.
-
After you create the database, create a table with the following query:
CREATE EXTERNAL TABLE IF NOT EXISTS alb_connection_logs (
time string,
client_ip string,
client_port int,
listener_port int,
tls_protocol string,
tls_cipher string,
tls_handshake_latency double,
leaf_client_cert_subject string,
leaf_client_cert_validity string,
leaf_client_cert_serial_number string,
tls_verify_status string,
conn_trace_id string
)
PARTITIONED BY
(
day STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' =
'([^ ]*) ([^ ]*) ([0-9]*) ([0-9]*) ([A-Za-z0-9.-]*) ([^ ]*) ([-.0-9]*) \"([^\"]*)\" ([^ ]*) ([^ ]*) ([^ ]*) ?([^ ]*)?( .*)?'
)
LOCATION 's3://S3-LOCATION/AWSLogs/ACCOUNT-NUMBER/elasticloadbalancing/REGION/'
TBLPROPERTIES
(
"projection.enabled" = "true",
"projection.day.type" = "date",
"projection.day.range" = "2023/01/01,NOW",
"projection.day.format" = "yyyy/MM/dd",
"projection.day.interval" = "1",
"projection.day.interval.unit" = "DAYS",
"storage.location.template" = "s3://S3-LOCATION/AWSLogs/ACCOUNT-NUMBER/elasticloadbalancing/REGION/${day}"
)
Note: Replace S3-LOCATION, ACCOUNT-NUMBER, and REGION with your S3 bucket values.
-
To query the Application Load Balancer connection logs, use SQL statements.
Example queries
View the latest 100 occurrences of logs
SELECT *FROM alb_conn_logs
ORDER by time desc
LIMIT 100
Count occurrences where tls_verify_status was NOT 'Success' and group by client_IP
SELECT distinct client_ip, count() as count from alb_conn_logs WHERE tls_verify_status != 'Success'
GROUP by client_ip
ORDER by count() DESC;
View all occurrences where tls_handshake_latency is over two seconds in a specific time range
SELECT * FROM alb_conn_logs WHERE (parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z')
BETWEEN parse_datetime('2024-01-01-00:00:00','yyyy-MM-dd-HH:mm:ss')
AND parse_datetime('2024-03-20-00:00:00','yyyy-MM-dd-HH:mm:ss'))
AND (tls_handshake_latency >= 2.0);
Check for connections with an exact certification validity date and group by leaf_client_cert_serial_number
SELECT DISTINCT leaf_client_cert_serial_number, count() AS count FROM alb_conn_logs
WHERE tls_verify_status = 'Success' AND leaf_client_cert_validity like '%NotAfter=2025-01-01%'
GROUP BY leaf_client_cert_serial_number
ORDER BY count() DESC;
Related information
How do I use Athena to analyze my Application Load Balancer access logs?
Query Application Load Balancer logs