Troubleshoot performance issues (packet drop, latency, or slow throughput) between on-premises and AWS VPC when using AWS site-to-site VPN.

5 minute read
Content level: Expert
1

In this document we will be discussing how you can troubleshooting performance Issue between on-premises and AWS VPC when using AWS site-to-site VPN.

To troubleshoot the performance issue, you can perform the below troubleshooting tests that will be useful to gauge the ongoing performance issue:

Virtual Private Gateway as a target gateway

Virtual Private Gateway as a target gateway

Transit Gateway as a target gateway

Transit Gateway as a target gateway

Test 1: Ping

It is the most basic command that can be used to test the connectivity between two hosts. Ping command will provide you below attributes to analyze the connectivity.

  • Amount of data sent in Bytes.
  • Packet round trip time that will be used to check latency.
  • Packet sent, received and lost.

Ping

Test 2: Trace-route test (ICMP and TCP based)

Traceroute is a utility that records the route between two network hosts. It also calculates and displays the amount of time each hop takes.

ICMP based Traceroute:

Windows: tracert <Private/Public IP of EC2 instance or On-Prem host>
Linux: traceroute <Private/Public IP of EC2 instance or On-Prem host>

ICMP based Traceroute

TCP based Traceroute

Windows (tracetcp.exe): tracetcp <Private/Public IP of EC2 instance or On-Prem host> -n
Linux: traceroute -T -p 80 <Private/Public IP of EC2 instance or On-Prem host>

TCP based Traceroute

Test 3: MTR test (ICMP and TCP based)

MTR is a tool which combines the functionality of the "traceroute" and "ping" programs in a single network diagnostic tool. MTR is available for both Linux and Windows (WinMTR) operating systems. This command can be used to check the Packet loss and Latency while reaching destination.

ICMP-based MTR:

mtr -n -c 200 <Private/Public IP of EC2 instance or On-Prem host> --report

MTR

TCP-based MTR:

mtr -n -T -c 200 <Private/Public IP of EC2 instance or On-Prem host> —report

TCP-based MTR

NOTE: MTR version 0.85 and above on Linux OS has TCP option. WinMTR does not support TCP-based MTR. WinMTR can be downloaded using referenced link: https://sourceforge.net/projects/winmtr/

Test 4: IPerf3 test:

IPerf3 is a tool for active measurements of the maximum achievable bandwidth on IP networks. It supports tuning of various parameters related to timing, buffers, and protocols (TCP, UDP, SCTP with IPv4 and IPv6). For each test, it reports the bandwidth, loss, and other parameters.

On the server: Server side command remains same for iperf3 test, client side command varies as follows:

iperf3 -s -V

On the client:

10 parallel TCP streams: iperf3 -c <Private/Public IP of EC2 instance or On-Prem host> -P 10 -t 30
20 parallel TCP streams: iperf3 -c <Private/Public IP of EC2 instance or On-Prem host> -P 20 -t 30
30 parallel TCP streams: iperf3 -c <Private/Public IP of EC2 instance or On-Prem host> -P 30 -t 30

Examples: Server side command:

iperf3 -s -V

Server Side:

Client Side Command:

iperf3 -c 13.58.x.x -P 10 -t 30

Client Side

200 Mbps UDP test: iperf3 -c <Private/Public IP of EC2 instance or On-Prem host> -u -b 200M -t 30
500 Mbps UDP test: iperf3 -c <Private/Public IP of EC2 instance or On-Prem host> -u -b 500M -t 30
1 Gbps UDP test: iperf3 -c <Private/Public IP of EC2 instance or On-Prem host> -u -b 1G -t 30

Example:

Server Side Command:

iperf3 -s -V

UDP based MTR server side

Client Side Command:

iperf3 -c 13.58.x.x -u -b 200M -t 30

UDP based MTR client side

Window size 128K: iperf3 -c <Private/Public IP of EC2 instance or On-Prem host> -w 128K -t 30
Window size 512K: iperf3 -c <Private/Public IP of EC2 instance or On-Prem host> -w 512K -t 30
Window size 1024K: iperf3 -c <Private/Public IP of EC2 instance or On-Prem host> -w 1024K -t 30

Example:

Server side command:

iperf3 -s -V

Server side

Client Side Command:

iperf3 -c 13.58.x.x -w 128K -t 30

Client Side

Test 5: Packet captures (Wireshark or tcpdump):

On EC2 instances and on-premises host(s), packet capture can be performed while the issue exists. Using pcaps, we can probe further to examine application layer data like Retransmissions, Duplicate ACKs, Out Of Oder TCP packets etc.

Note: When debugging performance issues, it's beneficial to do concurrent packet captures on your EC2 instance and your on-premises host. This makes it easier to isolate the problem at the networking and application layers by identifying the request and response packets. Additionally, starting the packet capture first before initiating the traffic is recommended. This aids in the flow's complete packet capture.

  1. Install Wireshark and take a packet capture.

  2. To isolate the traffic between specific sources in the packet capture, use the following filter which will show the tcp streams initiated by that source IP.

(ip.addr eq source_IP) &&(tcp.flags.syn == 1)
  1. Select the row with the relevant source IP and destination IP.

  2. Choose the context (right-click) menu, and then choose Follow, TCP Stream. This results in a TCP flow between the source IP and destination IP that you want to investigate.

  3. Look for retransmissions, duplicate packets, or TCP window size notifications like TCP window full or Window size zero. These notifications might indicate that the TCP buffers are running out of space.

If you find packet loss, or if the number of hops changes significantly from your benchmarks, refer to your networking equipment vendor documentation. If working within a multi-homed network environment, perform these tests using a different ISP.

References:

[1] Wireshark can be downloaded using referenced link: https://www.wireshark.org/download.html

[2] Once you identify the issue, you can also follow this article that provides recommendations to improve network performance on AWS and hybrid networks: https://aws.amazon.com/blogs/networking-and-content-delivery/improving-performance-on-aws-and-hybrid-networks/

[3] https://repost.aws/knowledge-center/network-issue-vpc-onprem-ig