How do I troubleshoot Amazon EKS managed node group creation failures?
My Amazon Elastic Kubernetes Service (Amazon EKS) managed node group failed to create. Nodes can't join the cluster, and I received the "Instances failed to join the kubernetes cluster" error.
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Run the AWSSupport-TroubleshootEKSWorkerNode automation runbook
Prerequisites: Your worker nodes must have permission to access AWS Systems Manager and Systems Manager must be running. To grant permissions, use the AmazonSSMManagedInstanceCore AWS managed policy. Attach the policy to the AWS Identity and Access Management (IAM) role that corresponds to your Amazon Elastic Compute Cloud (EC2) instance profile. For more information, see the To add instance profile permissions for Systems Manager to an existing role (console) section of Alternative configuration for EC2 instance permissions.
To use the AWSSupport-TroubleshootEKSWorkerNode runbook to troubleshoot issues, complete the following steps:
- Open the runbook.
- Make sure that the AWS Region in the AWS Management Console is the same as your Amazon EKS cluster's Region.
Note: Review the Runbook details section of the runbook for more information. - In the Input parameters section, enter the name of your cluster for ClusterName and your instance ID for WorkerID.
- (Optional) For AutomationAssumeRole, select the IAM role to allow Systems Manager to perform actions. If you don't specify a role, then Systems Manager uses your current IAM entity's permissions to perform the actions in the runbook.
- Choose Execute.
- Check the Outputs to identify why your worker node can't join your cluster and the steps that you can take to resolve the error.
Check your worker node security group traffic requirements
Confirm that you configured your control plane's security group and worker node security group with the requirements for inbound and outbound traffic. By default, Amazon EKS applies the cluster security group to the instances in your node group to facilitate communication between nodes and the control plane. If you specify custom security groups in the launch template for your managed node group, then Amazon EKS doesn't add the cluster security group.
Check your worker node's IAM permissions
Verify that you attached the AmazonEKSWorkerNodePolicy and AmazonEC2ContainerRegistryReadOnly policies to the instance IAM role that you associated with your worker node.
Important: It's a best practice to attach the AmazonEKS_CNI_Policy to an IAM role that's associated with the aws-node Kubernetes service account in the kube-system namespace. However, you can attach the policy to the node instance role instead, if needed.
Confirm that the Amazon VPC for your cluster has support for a DNS hostname and resolution
After you configure private access for your Amazon EKS cluster endpoint, activate a DNS hostname and DNS resolution for your Amazon Virtual Private Cloud (Amazon VPC). When you activate endpoint private access, Amazon EKS creates an Amazon Route 53 private hosted zone, and then associates it with your cluster's Amazon VPC. For more information, see Cluster API server endpoint.
Update the aws-auth ConfigMap with your worker nodes' NodeInstanceRole
Verify that you correctly configured the aws-auth ConfigMap with your worker nodes' IAM role instead of the instance profile.
Set the tags for your worker nodes
For the Tag property of your worker nodes, set key to kubernetes.io/cluster/clusterName and value to owned.
Confirm that the Amazon VPC subnets for the worker node have available IP addresses
If your Amazon VPC runs out of IP addresses, then associate a secondary Classless Inter-Domain Routing (CIDR) block with your existing Amazon VPC. For more information, see View Amazon EKS networking requirements for VPC and subnets.
Confirm that your Amazon EKS worker nodes can reach the API server endpoint for you cluster
You can launch worker nodes in any subnet within your cluster VPC or peered subnet if there's an internet route through the following gateways:
- NAT
- Internet
- Transit
If you launched your worker nodes in a restricted private network, then confirm that your worker nodes can reach the Amazon EKS API server endpoint. Make sure that you meet the requirements to run Amazon EKS in a private cluster without outbound internet access.
Note: You might have nodes in a private subnet that's backed by a NAT gateway. In this scenario, it's a best practice to create the NAT gateway in a public subnet.
If you don't use AWS PrivateLink endpoints, then verify access to API endpoints through a proxy server for the following AWS services:
- Amazon EC2
- Amazon Elastic Container Registry (Amazon ECR)
- Amazon Simple Storage Service (Amazon S3)
To verify that the worker node has access to the API server, use SSH to connect, and then run the following netcat command:
nc -vz 9FCF4EA77D81408ED82517B9B7E60D52.yl4.eu-north-1.eks.amazonaws.com 443
Note: Replace 9FCF4EA77D81408ED82517B9B7E60D52.yl4.eu-north-1.eks.amazonaws.com with your API server endpoint.
To check the kubelet logs before you disconnect from your instance, run the following command:
journalctl -f -u kubelet
If the kubelet logs don't provide information about the source of the issue, then run the following command to check the worker node's kubelet status:
sudo systemctl status kubelet
Review your Amazon EKS logs and the operating system (OS) logs for additional troubleshooting steps.
Verify that the API endpoints can reach your Region
Use SSH to connect to one of the worker nodes, and then run the following commands for each service:
-
Amazon EC2
nc -vz ec2.example-region.amazonaws.com 443 -
Amazon ECR
nc -vz ecr.example-region.amazonaws.com 443 -
Amazon S3
nc -vz s3.example-region.amazonaws.com 443
Note: Replace example-region with the Region for your worker node.
Configure the user data for your worker node
For managed node group launch templates with a specified Amazon Machine Image (AMI), you must supply bootstrap commands for worker nodes to join your cluster. Amazon EKS doesn't merge the default bootstrap commands into your user data. For more information, see Introducing launch template and custom AMI support in Amazon EKS managed node groups.
To configure user data, complete the following steps:
- Run the following describe-cluster AWS CLI command to retrieve the necessary data:
Note: Replace example-clustername with your cluster's name.aws eks describe-cluster --name example-clustername --query cluster.{name: name, endpoint: endpoint, certAuth: certificateAuthority.data, serviceIpv4Cidr: kubernetesNetworkConfig.serviceIpv4Cidr} - In the output note your cluster's API server endpoint, certificate authority, and service CIDR.
- Add the following configuration to your user data. For instructions on how to add the configuration, see the Amazon Linux 2023 user data section of Amazon EC2 User Data:
Note: Replace example-clustername with your cluster's name, example-api-server-endpoint with your cluster's API server endpoint, and example-certificate-authority with your cluster's certificate authority data. Also, replace example-service-cidr with your cluster's service CIDR.--- apiVersion: node.eks.aws/v1alpha1 kind: NodeConfig spec: cluster: name: example-clustername apiServerEndpoint: example-api-server-endpoint certificateAuthority: example-certificate-authority cidr: example-service-cidr ---
Related information
- Language
- English

Relevant content
- Accepted Answerasked 3 years ago
- Accepted Answerasked 3 years ago
- asked 3 months ago