External Traffic Fails to Reach EKS Pods with NLB Client IP Preservation

4 minute read
Content level: Advanced
1

We will be discussing random connection failures for an Amazon EKS service hosted on EKS exposed via an NLB with client IP preservation enabled.

Note: This article is for those customers who are using VPC CNI older than 1.8.0. This issue got fixed in later version.

Github issue here

In this article, we'll address a common issue faced by our customers, involving sporadic connection failures when accessing a service exposed through a Network Load Balancer (NLB) with client IP preservation enabled.

Problem Description:

Customers have reported an issue where external traffic fails to reach their pods, despite traffic flow logs indicating correct routing through the load balancer. Tcpdump analysis confirms that traffic successfully reaches the primary IP of the Elastic Network Interface (ENI) hosting the pod's IP address but fails to reach the intended pod. Interestingly, internal VPC traffic behaves as expected and reaches the pod without issues.

This article will delve into the root causes and solutions for this connectivity problem.

How to reproduce the issue:

  1. Create an EKS cluster with a node pool in a private subnet having Private Endpoint.
  2. Create worker nodes in public subnets with IGW enabled.
  3. Create the network load balancer service with the following annotations:
   service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
   service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=true
   service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip
  1. Do the following curl operations on the network load balancer from a external host or bastion host (any external client):
while true; do curl -s -o /dev/null -w “%{http_code}%” <Load_Balancer_Name>; done

replace <Load_Balancer_Name> with your Load Balancer name.

when customer specify "nlb-ip" annotation, the pods will be directly register as a target to the load balancer. This is the reason because of which NLB will send the traffic to appropriate pod. However, while the pod sending the traffic to NLB, the private IP address of each pod will be translated to the primary private IP address assigned to the primary network interface on the node.

Customer is observing that NLB discards the packet and the response will not be sent to the client.

Underlying Cause :

The root of this issue lies at the host level, specifically within the realm of host-level networking. To address this, Linux systems implement a safeguard known as Reverse Path Filtering (rp_filter), designed to thwart potential threats arising from IP address spoofing. This mechanism operates in three distinct modes: rp_filter 0, 1, and 2.

1. rp_filter=0 (No Source Address Validation):

  • In this mode, the system bypasses source address validation entirely.
  • Consequently, any incoming packet is permitted to traverse and reach its destination network, unchecked.

2. rp_filter=1 (Strict Interface Validation):

  • With rp_filter set to 1, the system enforces rigorous checks on the packet's arrival interface, ensuring it aligns with the optimal return path.
  • If the received packet deviates from the expected return path, it meets a swift fate—it's discarded.

3. rp_filter=2 (Asymmetric Routing Tolerance):

  • In the realm of rp_filter 2, flexibility reigns supreme.
  • Here, incoming packets can arrive via any interface, and outgoing responses can likewise depart through any interface, even if it involves asymmetric routing.
  • Only under exceptional circumstances—when routing becomes an insurmountable obstacle—are packets unceremoniously dropped.

Depending on the chosen mode, it either allows packets to flow freely, imposes strict interface criteria, or embraces the complexities of asymmetric routing while preserving network integrity and security.

The reverse path filter on secondary ENIs attached to Linux was set to strict mode (rp_filter =1). All ingress traffic on secondary interfaces with source IP outside of the VPC ranges was getting dropped without further processing. If rp_filter is made less restrictive (==0), the incoming packets on non eth0 do reach to the pods. However, the return path for non VPC traffic is always through eth0. This asymmetry causes the return packets to not go through the NLB, and as a result the affected connections are not successful.

This was the culprit which can be resolved after fixing the rp_filter value to 2.

Co-Authored by: Sidhartha Kotha

1 Comment

Is this a fix that customers need to implement themselves?

Pionerd
replied 9 months ago