- Newest
- Most votes
- Most comments
The issue you're experiencing with intermittent timeouts between your EKS pods and the private Network Load Balancer (NLB) in the same Availability Zone (AZ) is a known problem that can occur in certain EKS and NLB configurations. This behavior is likely related to how the NLB handles traffic from pods within the same AZ.
There are a few potential causes and solutions to consider:
-
Client IP preservation: The NLB's client IP preservation feature might be causing issues with pod-to-NLB communication. Try disabling client IP preservation on the NLB's target group. This has resolved similar issues for other users.
-
Cross-zone load balancing: Ensure that cross-zone load balancing is enabled on your NLB. This allows the load balancer to distribute traffic across all registered targets in all enabled Availability Zones, which may help mitigate the issue.
-
Security groups and network ACLs: Double-check that the security groups and network ACLs associated with your EKS nodes, pods, and NLB are correctly configured to allow traffic between them. Pay special attention to both inbound and outbound rules.
-
VPC CNI plugin configuration: Verify that the Amazon VPC CNI plugin for Kubernetes is correctly configured. Misconfiguration can sometimes lead to networking issues between pods and other AWS resources.
-
Proxy Protocol v2: If you're using the AWS Load Balancer Controller to deploy your NLB, ensure that Proxy Protocol v2 (PPv2) is disabled. PPv2 is not supported by all components and can cause connectivity issues.
-
DNS resolution: Check if there are any DNS resolution issues within your cluster. Ensure that the NLB's DNS name is resolving correctly to the appropriate IP addresses.
-
Routing tables: Verify that the routing tables for your VPC subnets are correctly configured to allow traffic between the EKS pods and the NLB.
If the issue persists after checking these potential causes, it's recommended to open a support case with AWS. They can help investigate the problem more deeply, as it may be related to the specific configuration of your EKS cluster, VPC, or NLB.
Additionally, consider using AWS VPC Flow Logs or packet captures to gain more insight into where exactly the traffic is being dropped or timing out. This can provide valuable information for troubleshooting the issue.
Sources
Resolve Network Load Balancer's TCP connectivity errors | AWS re:Post
Intermittent Failures in Pod Communications with Internal AWS Network Load Balancer | AWS re:Post
App Mesh connectivity troubleshooting - AWS App Mesh
Relevant content
asked 2 years ago
asked 3 years ago
asked 2 years ago
asked a year ago
- AWS OFFICIALUpdated 4 years ago
