Skip to content

Elastic Kubernetes Service: NodeCreationFailure "Unhealthy nodes in the kubernetes cluster" and warning "InvalidDiskCapacity 0 on image filesystem" when creating Node Group in EKS

0

Hello,

I am trying to create a Node Group in an Amazon EKS cluster and I consistently receive the following main error:

NodeCreationFailure: Unhealthy nodes in the kubernetes cluster Additionally, I am seeing the following technical warning:

InvalidDiskCapacity invalid capacity 0 on image filesystem I have already tried all recommended solutions, including:

Creating the EKS cluster with version 1.33 (also tested in us-east-1)

Node Group using Amazon Linux 2023 (x86_64) Standard (AL2023_x86_64_STANDARD)

Disk size set to 20 GiB

Node Group launched in a public subnet with auto-assign public IPv4 address enabled

Security Groups and IAM Roles correctly configured (AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly, AmazonEKS_CNI_Policy)

Tested with only one public subnet

My account has sufficient EC2 and VPC limits/quotas

Recreated VPCs and subnets from scratch (always public)

Tried creating the Node Group both from the AWS Console and with eksctl

The EC2 nodes are launched but remain "unhealthy" in the EKS cluster.

Please assist me in reviewing the backend logs and possible account-related restrictions or issues, as all evidence suggests this is not a user configuration error.

I can provide screenshots or additional details if needed.

Thank you very much.

  • Is this a warning or an error. If a warning, you can continue. The root cause is usually related to the node not having access to its EBS volume size information on initial startup.

asked 8 months ago1.6K views
1 Answer
0

Based on the issues you're experiencing with your EKS node group creation, there are several potential causes for the "NodeCreationFailure: Unhealthy nodes in the kubernetes cluster" error and the "InvalidDiskCapacity invalid capacity 0 on image filesystem" warning.

The most likely issue relates to the bootstrap script in the UserData configuration of your worker nodes. This script is critical for nodes to properly join the EKS cluster, and it must contain the correct EKS cluster name. When this configuration is incorrect, nodes can launch as EC2 instances but fail to register with the Kubernetes cluster.

Here are some specific troubleshooting steps to resolve this issue:

  1. Verify the UserData configuration for your worker nodes. Ensure the bootstrap script is present and correctly formatted with your exact cluster name:
#!/bin/bash
/etc/eks/bootstrap.sh your-cluster-name
  1. Check if there are any capacity issues with the disk. The "InvalidDiskCapacity invalid capacity 0" warning suggests there might be a problem with the disk allocation or filesystem configuration on the nodes.

  2. Review the CloudFormation stack events for the node group creation to identify specific error messages.

  3. Ensure your VPC and subnet configurations have the correct DHCP options set, including domain-name and domain-name-servers.

  4. Verify that the AWS STS endpoint for your region is enabled for your account.

  5. Check that security groups allow necessary traffic between the nodes and the control plane, particularly on port 50051 which is used for node-to-control-plane communication.

  6. If you're using a managed node group, verify the node group configuration in the EKS console to ensure the correct cluster name is specified.

  7. Try creating a node group in a different region to determine if the issue is region-specific.

  8. Check the CNI plugin version compatibility with your Kubernetes version. An incompatible or misconfigured CNI plugin can prevent nodes from becoming ready.

  9. Verify that kube-proxy is running correctly by checking its pod status in the kube-system namespace.

If these steps don't resolve the issue, you may need to collect more detailed logs from the worker nodes to identify the specific cause of the failure.
Sources
EKS node creation failed | AWS re:Post
EKS Nodes Fail to Create - but why? It worked yesterday? | AWS re:Post
EKS problem after upgrade from 1.23 | AWS re:Post

answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.