Facing PODs stuck in init state in EKS Cluster

0

I have one scenario: We have two AWS EKS clusters, which are using the same subnets Blue cluster: subnet-A, subnet-B Green cluster: subnet-A, subnet-B

Now I see subnet-A can have max 256 private IPs and all the IPs are used. zero IPs are availble. This is making new pods being schedule, to stuck in 'init' state. Why this is happening: Because AWS-CNI is unable to provide new private IPs to the new pods. Conclusion: If any subnet is exhausted and no IPs are available, we can see the pods may stuck in 'init' state, because we can not control on which instance, the pod is getting scheduled. It may be scheduled on any instance from subnet-A or subnet-B. To make sure this issue don't happen in future, we should have sufficient strength in all the subnets being used by the EKS clusters.

Has anyone saw this issue earlier and implemented some workarounds to resolve this issue? Also, not able to calculate the required size of a subnet that will be sufficient for all the services, is there a systematic approach for the same?

Vaibhav
質問済み 4ヶ月前271ビュー
1回答
0

Recently, in my company, we encountered the same issue in the production environment where we ran out of available private IPs in a subnet used by AWS EKS clusters. This resulted in pods being stuck in the 'init' state, and the problem was attributed to the AWS Container Network Interface (CNI) being unable to allocate new private IPs for the pods. To address this issue, we adjusted the configurations of our Auto-Scaling Groups (ASGs) to ensure they span across multiple subnets. This approach helps distribute pod placements across different subnets, making unused pod IP addresses available for new ones. However, it's important to note that this is a workaround, and for a permanent solution, we recommend considering resizing the subnets to allow for more available private IPs. If you encounter a similar issue in a production environment, I suggest seeking support from AWS. It may be beneficial to engage with AWS support to validate the proposed solutions mentioned above

Hope it clarifies and if does I would appreciate answer to be accepted so that community can benefit for clarity, thanks ;)

profile picture
エキスパート
回答済み 4ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ