How can I use CloudWatch metrics to identify NAT gateway bandwidth issues?

4 minute read
0

My NAT gateway isn't receiving the bandwidth that I expect. I want to use Amazon CloudWatch metrics to identify bandwidth issues.

Resolution

Benchmark the networking throughput

Complete the following steps:

  1. Set up a test environment to benchmark your network throughput between Amazon Elastic Compute Cloud (Amazon EC2) Linux instances in the same Amazon Virtual Private Cloud (Amazon VPC).
  2. Benchmark the traffic that an instance can manage.
  3. Repeat the preceding steps for the different instance types that are running behind the NAT gateway. To identify the instance types, see the Check the instances behind the NAT gateway section of this article.

Check the CloudWatch metrics for issues with throughput or NAT gateway bandwidth

Complete the following steps:

  1. Open the CloudWatch console.
  2. In the navigation pane, choose Metrics.
  3. Select the NAT gateway and then check whether there's a value that's greater than zero for the PacketsDropCount metric.
  4. Select the NAT gateway, and then check whether there's a value that's greater than zero for the ErrorPortAllocation metric.
  5. Select BytesOutToDestination, BytesOutToSource, BytesInFromDestination, and BytesInFromSource.
  6. Choose PeakPacketsPerSecond.
    Note: Check the maximum statistic to determine the average packet rate every 10 seconds for 60 seconds.

A healthy NAT gateway always has a value of zero. If the value is greater than zero, then there's an ongoing transient issue with the NAT gateway. Check the AWS Health Dashboard for notifications that are related to the NAT Gateway. If there are no notifications, then open a case with AWS Support.

To calculate the average bandwidth over a 1-minute interval, use one of the following formulas. The following formulas give the average bandwidth over a period of time but not the real per-second view of bandwidth. Depending on your usage patterns, the per-second bandwidth might have spikes and troughs. Your NAT gateway scales according to fluctuations in your traffic.

[( BytesOutToDestination + BytesOutToSource) * 8 / Time period in seconds].

[( BytesInFromDestination + BytesInFromSource) * 8 / Time period in seconds]

Note: For bandwidth bursts that exceed 100 Gbps, distribute the resources across multiple subnets and create multiple NAT gateways. For optimal performance, create your instances across private subnets in the same Availability Zone as your NAT gateway.

Check the instances behind the NAT gateway

Complete the following steps:

  1. Open the Amazon VPC console.
  2. In the navigation pane, under Route tables, select the route tables that point to the NAT gateway.
  3. Select the Subnet association view, and note all the subnet IDs.
  4. Open the Amazon EC2 console.
  5. In the navigation pane, under Instances, choose the Settings icon to view the Show and Hide columns.
  6. Select Subnet ID and Instance type.
  7. Identify the IDs of all the instances that are running in the associated subnets.

Check the CloudWatch metrics for all Amazon EC2 instances behind the NAT gateway

Complete the following steps:

  1. Open the Amazon CloudWatch console.
  2. In the navigation pane, under Metrics, choose EC2.
  3. Select the IDs of all the instances behind the NAT gateway.
  4. Under the Metric name column, select the NetworkIn, NetworkOut, and CPUUtilization metrics for all instances that were affected during the time you experienced bandwidth issues.
    Note: For instructions on how to check traffic usage, see Get statistics for a specific resource.
  5. Confirm that there are no CPU spikes or unusual increases in traffic at the same time as the bandwidth issue.
  6. Activate VPC Flow Logs at the subnet level to review traffic that passes through the NAT gateway. 

Compare the results

Check if the sum of networking throughput metrics across all instances behind the NAT gateway exceeds 100 Gbps in bursts. When the burst exceeds 100 Gbps, your NAT gateway has a bandwidth that's greater than the 100 Gbps quota. In this case, it's a best practice to distribute your traffic across multiple NAT gateways.

Related information

How do I set up a NAT gateway for a private subnet in Amazon VPC?

NAT gateways

Why can't my Amazon EC2 instance in a private subnet connect to the internet using a NAT gateway?

Compare NAT gateways and NAT instances