How can I analyze custom VPC Flow Logs using CloudWatch Logs Insights?
I have configured custom VPC Flow Logs. How can I discover patterns and trends with Amazon CloudWatch Logs Insights?
Short description
You can use CloudWatch Logs Insights to analyze VPC Flow Logs. CloudWatch Log Insights automatically discovers fields in many Amazon provided logs, as well as JSON formatted log events, to allow for easy query construction and log exploration. VPC Flow Logs that are in the default format are automatically discovered by CloudWatch Logs Insights.
But, VPC Flow Logs are deployed in a custom format. Because of this, they aren't automatically discovered, so you must modify the queries. This article gives several examples of queries that you can customize and extend to match your use cases.
This custom VPC Flow Logs format is used:
${account-id} ${vpc-id} ${subnet-id} ${interface-id} ${instance-id} ${srcaddr} ${srcport} ${dstaddr} ${dstport} ${protocol} ${packets} ${bytes} ${action} ${log-status} ${start} ${end} ${flow-direction} ${traffic-path} ${tcp-flags} ${pkt-srcaddr} ${pkt-src-aws-service} ${pkt-dstaddr} ${pkt-dst-aws-service} ${region} ${az-id} ${sublocation-type} ${sublocation-id}
Resolution
Retrieve latest VPC Flow Logs
Because log fields are not automatically discovered by CloudWatch Logs Insights, you must use the parse keyword to isolate desired fields. In this query, the results are sorted by the flow log event start time, and restricted to the two most recent log entries.
Query
#Retrieve latest custom VPC Flow Logs parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id | sort start desc | limit 2
Results
account_id | vpc_id | subnet_id | interface_id | instance_id | srcaddr | srcport |
---|---|---|---|---|---|---|
123456789012 | vpc-0b69ce8d04278ddd | subnet-002bdfe1767d0ddb0 | eni-0435cbb62960f230e | - | 172.31.0.104 | 55125 |
123456789012 | vpc-0b69ce8d04278ddd1 | subnet-002bdfe1767d0ddb0 | eni-0435cbb62960f230e | - | 91.240.118.81 | 49422 |
Summarize data transfers by source/destination IP address pairs
Next, summarize the network traffic by source/destination IP address pairs. In this example, the sum statistic is used to perform an aggregation on the bytes field. This calculates a cumulative total of the data transferred between hosts. For more context, the flow_direction is included. The results of this aggregation are then assigned to the Data_Transferred field, temporarily. Then, the results are sorted by Data_Transferred in descending order, and the two largest pairs are returned.
Query
parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id | stats sum(bytes) as Data_Transferred by srcaddr, dstaddr, flow_direction | sort by Data_Transferred desc | limit 2
Results
srcaddr | dstaddr | flow_direction | Data_Transferred |
---|---|---|---|
172.31.1.247 | 3.230.172.154 | egress | 346952038 |
172.31.0.46 | 3.230.172.154 | egress | 343799447 |
Analyze data transfers by EC2 instance ID
You can use custom VPC Flow Logs to analyze an Amazon Elastic Compute Cloud (Amazon EC2) instance ID, directly. Taking the previous query, you can now determine the most active EC2 instances by using the instance_id field.
Query
parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id | stats sum(bytes) as Data_Transferred by instance_id | sort by Data_Transferred desc | limit 5
Results
instance_id | Data_Transferred |
---|---|
- | 1443477306 |
i-03205758c9203c979 | 517558754 |
i-0ae33894105aa500c | 324629414 |
i-01506ab9e9e90749d | 198063232 |
i-0724007fef3cb06f3 | 54847643 |
Filter for rejected SSH traffic
To better understand the traffic that was denied by your security group and network access control lists (ACL), filter on reject VPC Flow Logs. You can further narrow this filter down to include protocol and target port. To identify hosts that are rejected on SSH traffic, extend the filter to include TCP protocol (for example, protocol 6) and traffic with a destination port of 22.
Query
parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id | filter action = "REJECT" and protocol = 6 and dstport = 22 | stats sum(bytes) as SSH_Traffic_Volume by srcaddr | sort by SSH_Traffic_Volume desc | limit 2
Results
srcaddr | SSH_Traffic_Volume |
---|---|
23.95.222.129 | 160 |
179.43.167.74 | 80 |
Isolate HTTP data stream for a specific source/destination pair
To further investigate trends in your data using CloudWatch Logs Insights, isolate bidirectional traffic between two IP addresses. In this query, ["172.31.1.247","172.31.11.212"] returns flow logs using either IP address as the source or destination IP address. To isolate HTTP traffic, the filter statements match VPC Flow Log events with protocol 6 (TCP) and port 80. Use the display keyword to return a subset of all available fields.
Query
#HTTP Data Stream for Specific Source/Destination Pair parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id | filter srcaddr in ["172.31.1.247","172.31.11.212"] and dstaddr in ["172.31.1.247","172.31.11.212"] and protocol = 6 and (dstport = 80 or srcport=80) | display interface_id,srcaddr, srcport, dstaddr, dstport, protocol, bytes, action, log_status, start, end, flow_direction, tcp_flags | sort by start desc | limit 2
Results
interface_id | srcaddr | srcport | dstaddr | dstport | protocol | bytes | action | log_status |
---|---|---|---|---|---|---|---|---|
eni-0b74120275654905e | 172.31.11.212 | 80 | 172.31.1.247 | 29376 | 6 | 5160876 | ACCEPT | OK |
eni-0b74120275654905e | 172.31.1.247 | 29376 | 172.31.11.212 | 80 | 6 | 97380 | ACCEPT | OK |
Isolate HTTP data stream for specific source/destination pair
You can use CloudWatch Logs Insights to visualize results as a bar or pie chart. If the results include the bin() function, then query results are returned with a timestamp. This timeseries can then be visualized with a line or stacked area graph.
Building on the previous query, you can use stats sum(bytes) as Data_Trasferred by bin(1m) to calculate the cumulative data transferred over one-minute intervals. To view this visualization, toggle between the Logs and Visualization tables in the CloudWatch Logs Insights console.
Query
parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id | filter srcaddr in ["172.31.1.247","172.31.11.212"] and dstaddr in ["172.31.1.247","172.31.11.212"] and protocol = 6 and (dstport = 80 or srcport=80) | stats sum(bytes) as Data_Transferred by bin(1m)
Results
bin(1m) | Data_Transferred |
---|---|
2022-04-01 15:23:00.000 | 17225787 |
2022-04-01 15:21:00.000 | 17724499 |
2022-04-01 15:20:00.000 | 1125500 |
2022-04-01 15:19:00.000 | 101525 |
2022-04-01 15:18:00.000 | 81376 |
Related information
Supported logs and discovered fields
Analyzing log data with CloudWatch Logs Insights
CloudWatch Logs Insights query commands
Tutorial: Run a query that produces a time series visualization
Relevant content
- asked a year agolg...
- asked 8 months agolg...
- asked a year agolg...
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 25 days ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago