Getting connection reset issue in pods

0

Hi All,

Earlier our infra was on EC2 where all services and elasticsearch was also on EC2 with both instance were separate. We recently moved our services to EKS and now the random API's are throwing connection reset exceptions. If anyone can help to sort out this issue would be appreciated. Both the EKS and Elastic search are in different VPC.

2 Answers
1

Hi,

Do you run Cilium in your EKS config ? It is actively used in EKS: see https://isovalent.com/blog/post/cilium-eks-anywhere/

If yes, make sure that you use a recent version of Cilium, at least > 1.12.3

See https://github.com/cilium/cilium/issues/21853 for all details

Best,

Didier

profile pictureAWS
EXPERT
answered a month ago
profile picture
EXPERT
reviewed a month ago
0

The connection reset issue you're experiencing with your EKS pods communicating with Elasticsearch in a different VPC could be due to several factors. Here are some potential causes and solutions to consider:

  1. Network Configuration: Ensure that proper VPC peering or AWS Transit Gateway is set up between the VPCs hosting your EKS cluster and Elasticsearch. Verify that the necessary routes are configured in both VPCs' route tables.

  2. Security Groups: Check that the security groups for both your EKS nodes and Elasticsearch allow the required inbound and outbound traffic on the appropriate ports (typically 9200 and 9300 for Elasticsearch).

  3. Network Policies: If you're using Kubernetes Network Policies, make sure they're not inadvertently blocking traffic to your Elasticsearch VPC.

  4. DNS Resolution: Verify that DNS resolution is working correctly between the VPCs, especially if you're using private hosted zones.

  5. Connection Timeouts: The connection reset could be due to timeouts. Try increasing timeout settings across your services, including any proxies or load balancers in between.

  6. Load Balancer Configuration: If you're using a load balancer to access Elasticsearch, ensure it's properly configured. In some cases, setting "Preserve client IP addresses" to false on the load balancer has resolved similar issues.

  7. EKS CNI (Container Network Interface) Issues: There could be problems with the Amazon VPC CNI's communication with the Kubernetes API server. Ensure your CNI add-on is up to date and properly configured.

  8. Pod IP Address Assignment: Verify that pods are being assigned IP addresses correctly and that there are sufficient IP addresses available in your VPC's subnets.

To further diagnose the issue:

  • Check the logs of the affected pods and any relevant networking components.
  • Use network troubleshooting tools like tcpdump or AWS VPC Flow Logs to analyze the traffic.
  • Consider using AWS X-Ray or similar tracing tools to identify where the connection is failing.

If the issue persists after checking these areas, it may be beneficial to open a support case with AWS to help trace and resolve the problem, as they can provide more detailed analysis of your specific setup.
Sources
Connection Resets when connecting to internal registry from EKS | AWS re:Post
Troubleshooting Amazon EKS networking issues at scale in an Enterprise scenario | AWS re:Post
Intermittent Failures in Pod Communications with Internal AWS Network Load Balancer | AWS re:Post

profile picture
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions