How do I use Athena to analyze Amazon VPC flow logs?
I want to use Amazon Athena to analyze my Amazon Virtual Private Cloud (Amazon VPC) flow logs.
Resolution
You can use the Athena console query editor to create a database, create a table for the VPC flow logs, and run example queries. Then, use the flow logs to analyze network traffic patterns and to identify threats and risks across your Amazon VPC network.
Create a database with Athena console query editor
Open the Athena console query editor, and then run the following command:
CREATE DATABASE test_db_vpclogs;
Note: Replace test_db_vpclogs with the name that you want to use for your database.
Important: It's a best practice to create the database in the same AWS Region as the Amazon Simple Storage Service (Amazon S3) bucket that the flow logs are stored in.
Create a table for the flow logs in the database
Open the Athena console. Then, in the navigation pane, choose Query editor. In the Athena console query editor, run a command similar to the following example:
CREATE EXTERNAL TABLE IF NOT EXISTS test_table_vpclogs ( version int, account_id string, interface_id string, srcaddr string, dstaddr string, srcport int, dstport int, protocol bigint, packets bigint, bytes bigint, start bigint, `end` bigint, action string, log_status string, vpc_id string, subnet_id string, instance_id string, tcp_flags int, type string, pkt_srcaddr string, pkt_dstaddr string, az_id string, sublocation_type string, sublocation_id string, pkt_src_aws_service string, pkt_dst_aws_service string, flow_direction string, traffic_path int ) PARTITIONED BY (region string, date string, hour string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/prefix/AWSLogs/{account_id}/vpcflowlogs/' TBLPROPERTIES ( "EXTERNAL"="true", "skip.header.line.count" = "1", "projection.enabled" = "true", "projection.region.type" = "enum", "projection.region.values" = "us-east-1,us-west-2,ap-south-1,eu-west-1", "projection.date.type" = "date", "projection.date.range" = "2021/01/01,NOW", "projection.date.format" = "yyyy/MM/dd", "projection.hour.type" = "integer", "projection.hour.range" = "00,23", "projection.hour.digits" = "2", "storage.location.template" = "s3://amzn-s3-demo-bucket/prefix/AWSLogs/${account_id}/vpcflowlogs/${region}/${date}/${hour}" )
Note: Replace test_table_vpclogs with the name that you want for your table.
Make sure that you modify the LOCATION parameter to point to the Amazon S3 bucket that contains your log data. Update projection.region.values to the AWS Regions that you have VPC flow logs. Set the value of projection.date.range to the earliest available logs in your data. For example, if your logs start on January 1, 2021, then set projection.date.range to '2021/01/01,NOW'. Modify the value according to the actual start date of your logs for accurate data projection. The preceding command uses partition projection to create a table, partition the table, and automatically populate the partitions. If a projected partition doesn't exist in Amazon S3, then Athena still projects the partition. It's a best practice to use partitioned attributes in your queries.
Run SQL statements on the table for the flow logs
Open the Athena console. Then, in the navigation pane, choose Query editor. Use the Athena console query editor to run SQL statements on the table. You can save the queries, view previous queries, or download query results in .csv format.
Example queries
Note: In the following example queries, replace test_table_vpclogs with the name of your table. Modify the column values and other variables for your use case.
To view the first 100 flow log entries in chronological order for a specified period of time, run the following query:
SELECT * FROM test_table_vpclogs WHERE day >= '2021/02/01' AND day < '2021/02/28' ORDER BY day ASC LIMIT 100;
To view the 10 destination servers that receive the most HTTPS packets for a specified period of time, run the following query:
SELECT SUM(packets) AS packetcount, dstaddr FROM test_table_vpclogs WHERE dstport = 443 AND day >= '2021/03/01' AND day < '2021/03/31' GROUP BY dstaddr ORDER BY packetcount DESC LIMIT 10;
To check the logs that are created for a specified time range, run the following query:
SELECT interface_id, srcaddr, action, protocol, to_iso8601(from_unixtime(start)) AS start_time, to_iso8601(from_unixtime("end")) AS end_time FROM test_table_vpclogs WHERE DAY >= '2021/04/01' AND DAY < '2021/04/30';
To view flow logs for a specific source IP address between a specified time range, run the following query:
SELECT * FROM test_table_vpclogs WHERE srcaddr = '10.117.1.22' AND day >= '2021/02/01' AND day < '2021/02/28';
To list the rejected TCP connections between a specified time range, run the following query:
SELECT day, interface_id, srcaddr, action, protocol FROM test_table_vpclogs WHERE action = 'REJECT' AND protocol = 6 AND day >= '2021/02/01' AND day < '2021/02/28' LIMIT 10;
To view the flow logs for the IP address range that starts with 10.117, run the following query:
SELECT * FROM test_table_vpclogs WHERE split_part(srcaddr,'.', 1)='10' AND split_part(srcaddr,'.', 2) ='117'
To view the flow logs for a specific destination IP address between a time range, run the following query:
SELECT * FROM test_table_vpclogs WHERE dstaddr = '10.0.1.14' AND day >= '2021/01/01' AND day < '2021/01/31'
Related information
How do I monitor traffic in my VPC with flow logs?
Analyzing VPC Flow Logs using Amazon Athena and Amazon QuickSight
- Topics
- Analytics
- Tags
- Amazon Athena
- Language
- English
Related videos


Relevant content
- asked 4 years ago
- Accepted Answerasked 9 months ago
AWS OFFICIALUpdated 2 months ago