Skip to content

First ping/connect through TGW and GWLB always slow

0

Hello,

we implemented a centralized inspection setup with multiple spoke connected to a transitgateway which then redirects all flows to the inspection VPC that hosts a GWLB and a third party firewall.

When I do a ping/connect inside a single spoke VPC, all pings/connects are equally "fast".

However, when I do the same for spoke to spoke (via the central inspection), I observed that always the very first ping/connect for a given flow is slow.

For example, lets say I do 100 pings from A to B:

  • first ping is ~5ms
  • all the others are ~2ms

I see the same behavior for a curl time_connect or mtr test.

I feel like maybe the TGW applicance mode hash algorithm and/or the GWLB might take some miliseconds to setup the initial flow. Can anybody confirm this behavior?latency

I tested again with the target VPC behind yet another TGW (so crossing another TGW peering). In this test, the first ping is even slower (> 10ms) and after that I have 1-3ms. So it looks like it is for sure (also) the TGW adding the first flow latency. Can anybody confirm?tgw ping

asked a year ago296 views

2 Answers
1

The behavior you're observing with the first ping/connect being slower than subsequent ones in a centralized inspection setup using Gateway Load Balancer (GWLB) and Transit Gateway (TGW) is indeed a known characteristic of this architecture.

This initial latency is primarily due to the setup time required for the first packet of a new flow. When a new connection is established, several processes occur:

  1. The Transit Gateway needs to determine the appropriate route for the traffic.
  2. The GWLB needs to select an appropriate target (in this case, a firewall instance) to handle the traffic.
  3. The firewall appliance needs to process the first packet and establish the session.

These initial setup processes contribute to the higher latency for the first ping or connection. Once the flow is established, subsequent packets can follow the same path without needing this initial setup, resulting in lower latency.

The TGW appliance mode's hash algorithm does play a role in this process. It ensures that all packets for a given flow are directed to the same firewall instance, which is crucial for maintaining stateful inspection. This consistent routing is necessary but can contribute to the initial latency.

Additionally, the GWLB uses GENEVE encapsulation to forward traffic to the firewall appliances. This encapsulation process, while efficient, does add some overhead to the initial packet processing.

It's worth noting that while this initial latency is noticeable, it's generally considered acceptable given the benefits of centralized inspection architecture. The architecture provides consistent security controls across your VPCs while allowing for scalability and easier management of security appliances.

To optimize performance, you could consider tuning the timeouts on various components (client, server, firewall, and OS levels) and ensuring that your firewall appliances are properly sized to handle the expected traffic load. However, some level of initial latency is inherent to this architecture and is a trade-off for the security and management benefits it provides.
Sources
Hyperplane TGW latency | AWS re:Post
NAT on Palo FW appliance with Gateway Load Balancer instead of using NAT Gateway | AWS re:Post
Simplify firewall deployments using centralized inspection architecture with Gateway Load Balancer | AWS Public Sector Blog

answered a year ago

AWS
EXPERT

reviewed a year ago

0

The behavior you're observing with the first ping/connect being slower than subsequent ones in a centralized inspection setup using Gateway Load Balancer (GWLB) and Transit Gateway (TGW) is indeed a known characteristic of this architecture.

This initial latency is primarily due to the setup time required for the first packet of a new flow. When a new connection is established, several processes occur:

The Transit Gateway needs to determine the appropriate route for the traffic. The GWLB needs to select an appropriate target (in this case, a firewall instance) to handle the traffic. The firewall appliance needs to process the first packet and establish the session. These initial setup processes contribute to the higher latency for the first ping or connection. Once the flow is established, subsequent packets can follow the same path without needing this initial setup, resulting in lower latency.

The TGW appliance mode's hash algorithm does play a role in this process. It ensures that all packets for a given flow are directed to the same firewall instance, which is crucial for maintaining stateful inspection. This consistent routing is necessary but can contribute to the initial latency.

Additionally, the GWLB uses GENEVE encapsulation to forward traffic to the firewall appliances. This encapsulation process, while efficient, does add some overhead to the initial packet processing.

It's worth noting that while this initial latency is noticeable, it's generally considered acceptable given the benefits of centralized inspection architecture. The architecture provides consistent security controls across your VPCs while allowing for scalability and easier management of security appliances.

To optimize performance, you could consider tuning the timeouts on various components (client, server, firewall, and OS levels) and ensuring that your firewall appliances are properly sized to handle the expected traffic load. However, some level of initial latency is inherent to this architecture and is a trade-off for the security and management benefits it provides.

Also the observed latency behavior in the initial connection can be explained by how Gateway Load Balancer (GWLB) and Transit Gateway handle flows. For TCP/UDP traffic, GWLB implements 5-tuple stickiness (source IP, destination IP, source port, destination port, protocol), while for all other traffic it uses 3-tuple stickiness (source IP, destination IP, protocol). When a new flow is initiated, GWLB needs to establish this stickiness and route the traffic to a target appliance.

Sources https://repost.aws/questions/QUomd1wdNYTPOMq8QUpSQcnQ/hyperplane-tgw-latency https://repost.aws/questions/QUs-FovHmIRLKcSJWrGhESiQ/nat-on-palo-fw-appliance-with-gateway-load-balancer-instead-of-using-nat-gateway https://aws.amazon.com/blogs/publicsector/simplify-firewall-deployments-using-centralized-inspection-architecture-with-gateway-load-balancer/

AWS

answered a year ago

EXPERT

reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.