How to figure out whether NAT Gateway processing charge is due to internet bound traffic or within AWS?
5 minute read
At times, identifying the cause of NAT Gateway processing charges through VPC flow logs could be overwhelming specifically when you don’t have any clue on which type of traffic – internet or AWS bound is dominant. In this article, you will use Cost Explorer, an easy-to-use interface that lets you visualize, understand, and manage your AWS costs, to help narrow down dominant traffic.
A NAT gateway is a Network Address Translation (NAT) service. You can use a NAT gateway so that instances in a private subnet can connect to services outside your Virtual Private Cloud (VPC) but external services cannot initiate a connection with those instances. These external services can reside either outside of AWS cloud or within AWS. We refer traffic to or from services or endpoints outside of AWS cloud as internet bound. And, traffic to or from services such as other VPC or AWS public endpoints or remote AWS region as AWS bound. The NAT gateway processing charge is based on the amount of traffic (in GB) that traverses either inbound or outbound to the VPC. In some cases, the NAT gateway processing charge seems unexpectedly high, and requires further investigation. There are various tools like VPC flow logs available to analyze this traffic.
In this article, I will help you get a high-level idea using Cost Explorer to narrow down whether the dominant traffic is AWS or internet bound. First, you narrow down AWS account and the region where you see the most NAT gateway processing charges coming from. Second, you identify how much of NAT gateway processing charges is caused by internet or AWS bound traffic. Lastly, you dive-deep referring to relevant articles on the dominant traffic causing this behavior.
In the right pane, under Report Parameters > Time, update Date Range = Past: 3 Months (preferably 3 months to date), and Granularity = Daily
Set Dimension = Linked account under Group by
Set Usage type = NatGateway-Bytes (GB) under Filters
Note the AWS account from Cost and usage graph or Cost and usage breakdown that shows highest usage
Add AWS account as filter by setting Linked account = <from step 6> under Filters
Set Dimension = Region under Group by
Note the AWS Region from Cost and usage graph or Cost and usage breakdown that shows highest usage
Add Region as filter by setting Region = <from step 9> under Filters
Note the Total Usage in GB as X from Cost and usage breakdown
Section II: Identifying dominant traffic - Internet or AWS bound
Continue from step 11 from section I, and click Clear next to Usage type under Filters
Type "datatransfer-in-bytes" as a value for Usage type, and select all the values. This should typically show -DataTransfer-In-Bytes (GB) values.
Type "datatransfer-out-bytes" as a value for Usage type, and select all the values. This should typically show -DataTransfer-Out-Bytes (GB) values.
Set Service = EC2-Instances (Elastic Compute Cloud - Compute) under Filters. NAT Gateway in this case is billed under EC2 instance as a service.
Set Dimension = Usage type under Group by
Note the Total Usage in GB as Y from Cost and usage breakdown
Note: Repeat Section I and II to understand the pattern by narrowing down to last 7 days or suspected dates, and/or go to hourly granularity.
Section III: Diving deep and next steps
Based on X and Y data aggregated monthly, daily or hourly, you may like to take one of the following routes to dive deep further:
If X (NatGateway-Bytes) is nearly equal to Y (sum of DataTransfer-In-Bytes and DataTransfer-Out-Bytes) or shows a pattern match, it means the NAT Gateway processing charges are driven by internet bound traffic. Review your applications running in the VPC's private network to see why they need to pull in our push out data to the internet. If the dominant direction of internet bound traffic is contributed by DataTransfer-In-Bytes, see if it is by design, or some application fault, that is continuously downloading data from public sites. One such example could be a containerized application crashing causing a docker image being pulled on a frequent basis from the public docker hub. If it is by design, check the feasibility to securely host such application as a public endpoint. To get the exact source and destination of such traffic, I then recommend you to analyze using VPC flow logs.
If X (NatGateway-Bytes) is significantly higher than Y (sum of DataTransfer-In-Bytes and DataTransfer-Out-Bytes) or doesn't show any pattern match, it means the NAT Gateway processing charges are driven by AWS bound traffic. Typically, such traffic is a result of uploading or downloading data to or from Amazon S3 bucket respectively without using S3 Gateway Endpoint, or data transfer to or from AWS public endpoint with VPC interface endpoints, or data transfer to or from your own application in another VPC in AWS public network. To get the exact source and destination of such traffic, I then recommend you to analyze using VPC flow logs.