By using AWS re:Post, you agree to the AWS re:Post Terms of Use

How do I troubleshoot network performance issues between EC2 Linux or Windows instances in a VPC and an on-premises host over the internet gateway?

11 minute read
1

I want to troubleshoot packet loss or latency issues between my Amazon Elastic Compute Cloud (Amazon EC2) instances in an Amazon Virtual Private Cloud (Amazon VPC) and an on-premises host over the internet gateway.

Short description

To diagnose network issues such as packet loss or latency, first test the network to isolate the source of the issue. Before you troubleshoot the issue, it's a best practice to benchmark the performance results.

Prerequisites:

  • Be sure that the network utilities are installed on both endpoints (on the EC2 instance and the on-premises host).
  • Use an EC2 instance that supports enhanced networking, and be sure that the drivers are up to date. If enhanced networking isn't turned on, then see Troubleshoot the ENA kernel driver on Linux or Troubleshoot the Elastic Network Adapter Windows driver.
  • Connect to your EC2 instance to access the instances. Verify end-to-end connectivity between your EC2 instance and your on-premises host.

Resolution

Install the following tools to help troubleshoot and test your network:

  • AWSSupport-SetupIPMonitoringFromVPC to collect network metrics such as packet loss, latency, MTR, tcptraceroute, and tracepath.
  • MTR to check for ICMP or TCP packet loss and latency problems.
  • Traceroute to determine latency or routing problems.
  • Hping3 to determine end-to-end TCP packet loss and latency problems.
  • Tcpdump to analyze packet capture samples.

Review traceroute or MTR reports

Review the hops on traceroute or MTR reports using a bottom-up approach. In a bottom-up approach, first check for loss on the last hop or destination, and then review the preceding hops.

  • If the packet loss or latency issues continue through the last hop, then there might be a network or routing issue.
  • If there is a control plane rate limit on that node, then packet loss or latency on one hop in the path might occur.
  • Check whether the last hop reported is the destination noted in the command. If the last hop isn't noted, then there might be an issue caused by a restrictive security group.

Test performance using AWSSupport-SetupIPMonitoringFromVPC

This built-in tool collects many of the metrics that you need to troubleshoot your network. For more information, see Debugging tool for network connectivity from Amazon VPC.

Performance troubleshooting for Linux instances

Check the Linux performance statistics

If you have access to the source instance or destination instance, then check for issues with the CPU, memory utilization, and load average.

Test performance using MTR

The Linux MTR command provides continual, updated output. This diagnostic tool combines the functionality of traceroute and ping utilities. The output from this tool allows you to analyze network performance. Most Linux distributions come with traceroute and MTR preinstalled. You can also download MTR from your distribution's software package manager.

To install MTR, run the following commands:

Amazon Linux:

sudo yum install mtr

Ubuntu:

sudo apt-get install mtr-tiny

To test your network's performance using MTR, run this test bidirectionally between the public IP address of your EC2 instances and your on-premises host. If the direction is reversed, then the path between nodes on a TCP/IP network can change. It's a best practice that you obtain MTR results for both directions. You can use a TCP-based trace instead of ICMP because most internet devices deprioritize ICMP-based trace requests.

Review your packet loss. Packet loss on a single hop usually doesn't indicate an issue. The loss can be the result of a control plane policy that causes the "ICMP time exceeded" messages to be dropped. If you notice sustained packet loss until the destination hop, or packet loss over several hops, then this loss might indicate a problem.

Note: It's common to see a few requests time out.

Replace PUBLIC_IP with the public IP EC2 instance on-premises host.

ICMP-based MTR:

mtr -n -c 200 PUBLIC_IP --report

TCP-based MTR:

mtr -n -T -c 200 PUBLIC_IP --report

The argument -T performs a TCP-based MTR, and the report option puts MTR into report mode. MTR runs for the number of cycles specified by the -c option. Print the statistics, and then exit.

Note: The TCP-based MTR tests the destination TCP port 80, to MTR a specific destination TCP port, appended with -P, followed by the port number. The following is an example to MTR destination TCP port 443:

mtr -n -T -c 200 PUBLIC_IP -P 443 --report

Test performance using traceroute

The Linux traceroute utility identifies the path taken from a client node to the destination node. The utility records the time in milliseconds for each router to respond to the request. The utility also calculates the amount of time that each hop takes before reaching its destination.

To install traceroute, run the following commands:

Amazon Linux:

sudo yum install traceroute

Ubuntu:

sudo apt-get update
 `sudo apt-get install traceroute`

Note: If you run an MTR report, then traceroute isn't necessary. MTR provides latency and packet loss statistics to a destination.

Be sure that port 22 or the port that you're testing is open in both directions. To troubleshoot network connectivity using traceroute, run the command from the client to the server. Then, run the command from the server back to the client. The path between nodes on a TCP/IP network can change if the direction is reversed. Use a TCP-based trace instead (your application port) of ICMP, because most internet devices deprioritize ICMP-based trace requests.

ICMP-based traceroute:

sudo traceroute -I PUBLIC_IP

TCP-based traceroute:

sudo traceroute -n -T -p 22 PUBLIC_IP

The argument -T -p 22 -n performs a TCP-based trace on port 22.

Note: You can use your application specific port for testing. Use the specific port to understand if there are any intermediate devices in the path dropping your application traffic.

Test performance using hping3

Hping3 is a command-line TCP/IP packet assembler and analyzer that measures end-to-end packet loss and latency over a TCP connection. Download hping3 at the Die.net site.

In addition to ICMP echo requests, hping3 supports TCP, UDP, and RAW-IP protocols. Hping3 also includes a traceroute mode that can send files between a covered channel. Hping3 can scan hosts, assist with penetration testing, test intrusion detection systems, and send files between hosts.

MTRs and traceroute capture per-hop latency. However, in addition to packet loss, hping3 results show end-to-end min/avg/max latency over TCP.

To install hping3, run the following commands:

Amazon Linux 2:

Install the EPEL release package for RHEL 7, then activate the EPEL repository.

sudo amazon-linux-extras install epel -y

Amazon Linux 2:

sudo yum --enablerepo=epel install hping3

Ubuntu:

sudo apt-get install hping3

The following command sends 50 TCP SYN packets over port 0. By default, hping3 sends TCP headers to the target host's port 0 with a window size of 64 and without a TCP flag:

sudo hping3 -S -c 50 -V PUBLIC_IP

The following command sends 50 TCP SYN packets over port 22:

sudo hping3 -S -c 50 -V PUBLIC_IP -p 22

Note: Be sure that port 22 or the port that you're testing is open.

Test packet capture samples using tcpdump

It's a best practice to perform simultaneous packet captures on your EC2 instance and on-premises host when diagnosing packet loss or latency issues. These captures can help to identify the request and response packets so that you can isolate the issue at the networking and application layers. It's also a best practice to first start the packet capture, and then initiate the traffic. This order of action helps capture all packets for the flow.

To install tcpdump, run the following commands:

Amazon Linux:

sudo yum install tcpdump

Ubuntu:

sudo apt-get install tcpdump

After you install tcpdump, run the following command to capture the tcp port 22 traffic and then save the output in a pcap file.

sudo tcpdump -i eth0 port 22 -s0 -w samplecapture.pcap

Note: The tcpdump flag -i specifies the interface on the instance where tcpdump captures the traffic. You might need to change the interface from eth0 to the configured interface in your environment.

Performance troubleshooting for Windows

Check for ECN capability

  1. To determine if Explicit Congestion Notification (ECN) capability is turned on, run the following command:

    netsh interface tcp show global
  2. If ECN capability is activated, then to deactivate it, run the following command:

    - netsh interface tcp set global ecncapability=disabled
  3. If you don't see an improvement in performance, then run the following command to reactivate ECN capability:

    netsh interface tcp set global ecncapability=enabled

Review hops and troubleshoot TCP port connectivity
First, use MTR or tracert to review hops.

MTR method:

  1. Download and then install WinMTR from the Sourceforge.net website.
  2. Enter the destination IP in the Host section, and then choose Start.
  3. Let the test run for a minute, and then choose Stop.
  4. Choose Copy text to clipboard and paste the output in a text file.
  5. Look for any losses in the % column that are propagated to the destination.
    Note: Ignore any hops with the No response from host message. This message indicates that those particular hops aren't responding to the ICMP probes.
  6. Review hops on the MTR reports using a bottom-up approach. For example, check for loss on the last hop or destination, and then review the preceding hops.

Tracert method:
If you don't want to install MTR, then use the tracert command utility tool.

  1. Perform a tracert to the destination URL or IP address.

  2. Look for any hop that shows an abrupt spike in round-trip time (RTT). An abrupt spike in RTT might indicate that there's a node under high load. This load can induce latency or packet drops in your traffic.
    Note: The -d option doesn't resolve IP addresses to hostnames. Remove -d if IP to hostname resolution is required.

    tracert -d PUBLIC_IP

  3. Then, check TCP port connectivity.

Note: Because WinMTR and tracert are both ICMP-based, you can use tracetcp to troubleshoot TCP port connectivity.

  1. Download the tracetcp ZIP file from the NetworkHunt.com website.
  2. Extract the tracetcp ZIP file.
  3. Copy tracetcp.exe to your C: drive.
  4. Download and then install WinPcap from the WinPcap.org website.
  5. Open the command prompt and root WinPcap to your C: drive using the C:\Users\username>cd command.
  6. To run tracetcp, use the following commands:
    tracetcp.exehostname:port
    -or-
    tracetcp.exe ip:port

Check the Windows Task Manager

If you have access to the source instance or destination instance, check the Windows Task Manager. Look for issues with CPU and memory utilization, or load average.

Take a packet capture

Note: When you diagnose packet loss or latency issues., it's a best practice to perform simultaneous packet captures on your EC2 instance and your on-premises host. This action helps to identify the request and response packets to isolate the issue at the network and application layers. It's also a best practice to first start the packet capture and then initiate the traffic. These actions help capture all packets for the flow.

  1. Download and install Wireshark from the Wireshark.org website. Then, take a packet capture.
  2. Use the following filter to isolate the traffic between particular sources in the packet capture: (ip.addr eq source_IP) &&(tcp.flags.syn == 1).
    The output shows all the tcp streams initiated by that source IP.
  3. Select the row with the relevant source IP and destination IP.
  4. Choose the context (right-click) menu, and then choose Follow, TCP Stream. This action results in a TCP flow between the source IP and destination IP that you want to investigate.
  5. Look for retransmissions, duplicate packets, or TCP window size notifications such as TCP window full or Window size zero. These notifications might indicate that the TCP buffers are running out of space.

If you find packet loss, or if the number of hops changes significantly from your benchmarks, refer to your networking equipment vendor documentation. If working within a multi-homed network environment, perform these tests using a different ISP.

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago