Kubernetes NodePool didn't get properly scaled out - errorCode=timeout?

0

Hi there.

There was a case last Saturday (18th, UTC) that one of my EKS Cluster's Node Pool failed to scale out.

As a result, a few pods were pending for a long while, till I manually scaled out the Node Pool on the AWS console by increasing the desired instance count.

I checked the Cluster Autoscaler logs and found something like this.

W1118 22:35:09.646968       1 clusterstate.go:264] Scale-up timed out for node group eks-on-demand-c5a-2xlarge-<ID> after 15m5.306683546s
W1118 22:35:09.647071       1 clusterstate.go:287] Disabling scale-up for node group eks-on-demand-c5a-2xlarge-<ID> until 2023-11-18 22:40:09.641115003 +0000 UTC m=+425439.629820752; errorClass=Other; errorCode=timeout

It seemed like the node creation got timed out during its process. But I didn't get any warnings or indications about this from anywhere else. Even the node group page on the EKS console didn't report any node health issues.

Is there a way to further investigate this? I just want to understand what exactly happened at that moment.

I suspect that it's an AWS-side infrastructure issue, but not 100% sure.

This ain't happening anymore since after the case.

Thanks a lot.

1개 답변
1
수락된 답변

It appears that the Cluster Autoscaler (CA) was unable to scale up the node group within the specified timeout period. The timeout is set to 15 minutes in your case (15m5.306683546s), and the error indicates that the node creation process took longer than expected. I recommend reviewing the AWS EKS events with the following command: kubectl get events. Additionally, check the** AWS CloudTrail logs** and inspect the control plane logs. With this information, you can contact AWS Support, providing them with the timestamp and date of the incident for further investigation, especially if the issue persists

profile picture
전문가
답변함 6달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠