How do I analyze my Application Load Balancer access logs using Amazon Athena?
I want to analyze my Application Load Balancer access logs with Amazon Athena.
Short description
Elastic Load Balancing doesn't activate access logging by default. When you activate access logging, you specify an Amazon Simple Storage Service (Amazon S3) bucket. All Application Load Balancer and Classic Load Balancer access logs are stored in that Amazon S3 bucket. When you want to troubleshoot or analyze the performance of your load balancer, you can use Athena to analyze the access logs in S3.
Note: Although you can use Athena to analyze access logs for both Application Load Balancers and Classic Load Balancers, only Application Load Balancers are covered in this article.
Resolution
Create a database and table for Application Load Balancer logs
To analyze access logs in Athena, create a database and table by doing the following:
1. Open the Athena console.
2. In the Query Editor, run a command similar to the following to create a database. It's a best practice to create the database in the same AWS Region as the S3 bucket.
create database alb_db
3. In the database that you created in previous step, create a table alb_log for the Application Load Balancer logs. For more information, see Creating the table for Application Load Balancer logs.
Note: For better query performance, you can choose to create a table with partition projection. In partition projection, partition values and locations are calculated from configuration rather than read from a repository, such as the AWS Glue Data Catalog. For more information, see Partition projection with Amazon Athena.
CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs ( type string, time string, elb string, client_ip string, client_port int, target_ip string, target_port int, request_processing_time double, target_processing_time double, response_processing_time double, elb_status_code int, target_status_code string, received_bytes bigint, sent_bytes bigint, request_verb string, request_url string, request_proto string, user_agent string, ssl_cipher string, ssl_protocol string, target_group_arn string, trace_id string, domain_name string, chosen_cert_arn string, matched_rule_priority string, request_creation_time string, actions_executed string, redirect_url string, lambda_error_reason string, target_port_list string, target_status_code_list string, classification string, classification_reason string ) PARTITIONED BY ( day STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"') LOCATION 's3://your-alb-logs-directory/AWSLogs/1111222233334444/elasticloadbalancing/<REGION>/' TBLPROPERTIES ( "projection.enabled" = "true", "projection.day.type" = "date", "projection.day.range" = "2022/01/01,NOW", "projection.day.format" = "yyyy/MM/dd", "projection.day.interval" = "1", "projection.day.interval.unit" = "DAYS", "storage.location.template" = "s3://your-alb-logs-directory/AWSLogs/1111222233334444/elasticloadbalancing/<REGION>/${day}" )
Be sure to replace the table name and S3 locations according to your use case.
Or, you can create a table with partitions using the following query and load the partitions using the ALTER TABLE ADD PARTITION command.
CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs_partitioned ( type string, time string, elb string, client_ip string, client_port int, target_ip string, target_port int, request_processing_time double, target_processing_time double, response_processing_time double, elb_status_code string, target_status_code string, received_bytes bigint, sent_bytes bigint, request_verb string, request_url string, request_proto string, user_agent string, ssl_cipher string, ssl_protocol string, target_group_arn string, trace_id string, domain_name string, chosen_cert_arn string, matched_rule_priority string, request_creation_time string, actions_executed string, redirect_url string, lambda_error_reason string, target_port_list string, target_status_code_list string, classification string, classification_reason string ) PARTITIONED BY(day string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"') LOCATION 's3://my_log_bucket/AWSLogs/1111222233334444/elasticloadbalancing/us-east-1/'
ALTER TABLE alb_logs_partitioned ADD PARTITION (day = '2022/05/21') LOCATION's3://my_log_bucket/AWSLogs/1111222233334444/elasticloadbalancing/us-east-1/2022/05/21/'
Note: It's not a best practice to use an AWS Glue crawler on the Application Load Balancer logs.
4. Under Tables in the navigation pane, choose Preview table from the menu button that's next to the table name. The data from the Application Load Balancer access logs is visible in the Results window.
5. Use the Query editor to run SQL statements on the table. You can save queries, view previous queries, or download query results in CSV format.
Example queries
In the following examples, be sure to modify the table name, column values, and other variables to fit your query.
Action | Query |
View the first 100 access log entries in chronological order. Use case: analysis and troubleshooting | SELECT *FROM alb_logORDER by time ASCLIMIT 100; |
List all client IP addresses that accessed the Application Load Balancer, and how many times they accessed the Application Load Balancer. Use case: analysis and troubleshooting | SELECT distinct client_ip, count() as count from alb_logGROUP by client_ipORDER by count() DESC; |
List the average amount of data (in kilobytes) that is passing through the Application Load Balancer in request/response pairs. Use case: analysis and troubleshooting | SELECT (avg(sent_bytes)/1000.0 + avg(received_bytes)/1000.0)as prewarm_kilobytes from alb_log; |
List all targets that the Application Load Balancer is routing traffic to and how many times the Application Load Balancer has routed requests to each target, by percentage distribution. Use case: identify potential target traffic imbalances | SELECT target_ip, (Count(target_ip)* 100.0 / (Select Count(*) From alb_log))as backend_traffic_percentageFROM alb_logGROUP by target_ipORDER By count() DESC; |
List the times that a client sent a request to the Application Load Balancer and then closed the connection to the Application Load Balancer before the idle timeout elapsed (HTTP 460 error). Use case: troubleshoot HTTP 460 errors | SELECT * from alb_log where elb_status_code = '460'; |
List the times that a client request wasn't routed because the listener rule forwarded the request to an empty target group (HTTP 503 error). Use case: troubleshoot HTTP 503 errors | SELECT * from alb_log where elb_status_code = '503'; |
List clients in descending order, by the number of times that each client visited a specified URL. Use case: analyze traffic patterns | SELECT client_ip, elb, request_url, count(*) as count from alb_logGROUP by client_ip, elb, request_urlORDER by count DESC; |
List the 10 URLs that Firefox users accessed most frequently, in descending order. Use case: analyze traffic distribution and patterns | SELECT request_url, user_agent, count(*) as countFROM alb_logWHERE user_agent LIKE '%Firefox%'GROUP by request_url, user_agentORDER by count(*) DESCLIMIT 10; |
List clients in descending order, by the amount of data (in megabytes) that each client sent in their requests to the Application Load Balancer. Use case: analyze traffic distribution and patterns | SELECT client_ip, sum(received_bytes/1000000.0) as client_datareceived_megabytesFROM alb_logGROUP by client_ipORDER by client_datareceived_megabytes DESC; |
List each time in a specified date range when the target processing time was more than 5 seconds. Use case: troubleshoot latency in a specified time frame | SELECT * from alb_logWHERE (parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') BETWEEN parse_datetime('2018-08-08-00:00:00','yyyy-MM-dd-HH:mm:ss') AND parse_datetime('2018-08-08-02:00:00','yyyy-MM-dd-HH:mm:ss'))AND (target_processing_time >= 5.0); |
Count the number of HTTP GET requests received by the load balancer grouped by the client IP address. Use case: analyze incoming traffic distribution | SELECT COUNT(request_verb) AS count, request_verb, client_ip FROM alb_log_partition_projection WHERE day = '2022/05/21' GROUP BY request_verb, client_ip; |
Ähnliche Videos

Relevanter Inhalt
- AWS OFFICIALAktualisiert vor einem Jahr
- AWS OFFICIALAktualisiert vor 10 Monaten
- AWS OFFICIALAktualisiert vor 4 Monaten