Facing PODs stuck in init state in EKS Cluster

0

I have one scenario: We have two AWS EKS clusters, which are using the same subnets Blue cluster: subnet-A, subnet-B Green cluster: subnet-A, subnet-B

Now I see subnet-A can have max 256 private IPs and all the IPs are used. zero IPs are availble. This is making new pods being schedule, to stuck in 'init' state. Why this is happening: Because AWS-CNI is unable to provide new private IPs to the new pods. Conclusion: If any subnet is exhausted and no IPs are available, we can see the pods may stuck in 'init' state, because we can not control on which instance, the pod is getting scheduled. It may be scheduled on any instance from subnet-A or subnet-B. To make sure this issue don't happen in future, we should have sufficient strength in all the subnets being used by the EKS clusters.

Has anyone saw this issue earlier and implemented some workarounds to resolve this issue? Also, not able to calculate the required size of a subnet that will be sufficient for all the services, is there a systematic approach for the same?

Vaibhav
asked 3 months ago258 views
1 Answer
0

Recently, in my company, we encountered the same issue in the production environment where we ran out of available private IPs in a subnet used by AWS EKS clusters. This resulted in pods being stuck in the 'init' state, and the problem was attributed to the AWS Container Network Interface (CNI) being unable to allocate new private IPs for the pods. To address this issue, we adjusted the configurations of our Auto-Scaling Groups (ASGs) to ensure they span across multiple subnets. This approach helps distribute pod placements across different subnets, making unused pod IP addresses available for new ones. However, it's important to note that this is a workaround, and for a permanent solution, we recommend considering resizing the subnets to allow for more available private IPs. If you encounter a similar issue in a production environment, I suggest seeking support from AWS. It may be beneficial to engage with AWS support to validate the proposed solutions mentioned above

Hope it clarifies and if does I would appreciate answer to be accepted so that community can benefit for clarity, thanks ;)

profile picture
EXPERT
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions