How can I use Amazon CloudWatch metrics to identify NAT gateway bandwidth issues?

4 分的閱讀內容
0

My NAT gateway is not receiving the bandwidth that I expect, and I want to identify bandwidth issues using Amazon CloudWatch metrics.

Short description

To identify the source of bandwidth issues with your NAT gateway, follow these steps:

  • Benchmark the networking throughput for your NAT gateway traffic and bytes per second for your Amazon Elastic Compute Cloud (Amazon EC2) instances.
  • Review the CloudWatch metrics for the NAT gateway that has issues.
  • Check all the instances behind the NAT gateway, and verify their CloudWatch metrics.
  • Compare the results between the benchmarking tests and the CloudWatch metrics.

Resolution

Benchmark the networking throughput

1.    Set up a test environment to benchmark your network throughput between Amazon EC2 Linux instances in the same virtual private cloud (VPC).

2.    Benchmark the traffic (bytes per second) that an instance can handle.

3.    Repeat these steps for the different instance types that you have running behind the NAT gateway. To identify the instance types, see Check the instances behind the NAT gateway section below.

Review the CloudWatch metrics for issues with throughput or NAT gateway bandwidth

1.    Open the CloudWatch console.

2.    In the navigation pane, under Metrics, search for the NAT gateway.

3.    Select the NAT gateway, and then choose the PacketsDropCount metric. Note: A healthy NAT gateway always has a value of zero. A non-zero value indicates an ongoing transient issue with the NAT gateway. If the value isn't zero, then refer to the AWS Health Dashboard. If there are no notifications on the AWS Personal Health Dashboard, then open a case with AWS Support.

4.    Select the NAT gateway, and then confirm that there's a value of zero for the ErrorPortAllocation metric**.
Note**: A value greater than zero indicates that too many concurrent connections to the same destination are open through the NAT gateway.

5.    Select BytesOutToDestination, BytesOutToSource, BytesInFromDestination, and BytesInFromSource. Note: Bandwidth is calculated as [( BytesOutToDestination + BytesOutToSource + BytesInFromDestination + BytesInFromSource) * 8 / Time period in seconds].

If you need more than 100 Gbps of bandwidth bursts, then split the resources between multiple subnets and create multiple NAT gateways. For optimal performance, create your EC2 instances across private subnets that are in the same Availability Zone as your NAT gateway.

Check the instances behind the NAT gateway

1.    Open the Amazon VPC console.

2.    In the navigation pane, under Route Tables, select the route tables that have routes pointing to the NAT gateway.

3.    Select the Subnet Association view, and note all the subnet IDs.

4.    Open the Amazon EC2 console.

5.    In the navigation pane, under Instances, choose the settings icon to view the Show/Hide Columns.

6.    Select Subnet ID and Instance Type.

7.    Note the IDs of all the instances that are launched in the subnets noted in step 3.

Verify the CloudWatch metrics for all the instances behind the NAT gateway

1.    Open the Amazon CloudWatch console.

2.    In the navigation pane, under Metrics, choose EC2.

3.    Select the IDs of all the instances behind the NAT gateway that were noted previously.

4.    Under the Metric Name column, select NetworkIn/NetworkOut and CPUUtilization on all the instances during the time that you experienced bandwidth issues.

5.    Confirm that there are no CPU spikes or abnormal increases in traffic at the same time as the bandwidth issue.

6.    Activate the flow logs at the subnet level to review the traffic flowing through the NAT gateway. For more information about enabling flow logs, see Logging IP traffic using VPC Flow Logs.

Compare the results

Check if the sum of networking throughput metrics across all instances behind the NAT gateway is more than 100 Gbps bursts. In this case, your bandwidth on the NAT gateway reflects a value that's greater than 100 Gbps. If your bandwidth on the NAT gateway is greater than 100 Gbps, then it's a best practice to split your traffic across multiple NAT gateways.

If the sum of throughput metrics is less than 100 Gbps bursts, then the NAT gateway's bandwidth reflects a value that's less than 100 Gbps. If your bandwidth on the NAT gateway is less than 100 Gbps, then the NAT gateway can sufficiently handle the traffic flowing through it.


Related information

How do I set up a NAT gateway for a private subnet in Amazon VPC?

NAT gateways

Why can't my Amazon EC2 instance in a private subnet connect to the internet using a NAT gateway?

Compare NAT gateways and NAT instances

AWS 官方
AWS 官方已更新 1 年前