My worker nodes fail to join my Amazon Elastic Kubernetes Service (Amazon EKS) cluster.
Resolution
To troubleshoot worker nodes that fail to join your Amazon EKS cluster, complete the following steps:
Use Systems Manager automation to identify issues
Run the AWSSupport-TroubleshootEKSWorkerNode automation runbook to identify issues that prevent worker nodes from joining your cluster.
Prerequisites:
- Worker nodes must have the AmazonSSMManagedInstanceCore policy attached to their AWS Identity and Access Management (IAM) role
- Worker nodes must be running and accessible through AWS Systems Manager
For more information about running this automation, see AWSSupport-TroubleshootEKSWorkerNode.
Check DNS configuration for your VPC
Complete the following steps:
- Open the Amazon Virtual Private Cloud (Amazon VPC) console.
- In the navigation pane, choose Your VPCs.
- Select your VPC.
- Choose Actions, then choose Edit VPC settings.
- Verify the following settings:
For DNS resolution, confirm it's turned on.
For DNS hostnames, confirm it's turned on.
- In the navigation pane, choose DHCP options sets.
- Select your DHCP options set.
- Verify the following values:
For domain-name, confirm it's set to region.compute.internal. For example, us-west-2.compute.internal.
For domain-name-servers, confirm it's set to AmazonProvidedDNS
For more information about DHCP options sets, see DHCP option sets in Amazon VPC.
Verify IAM permissions for worker nodes
Complete the following steps:
- Open the IAM console.
- In the navigation pane, choose Roles.
- Search for your worker node IAM role.
- Select the role.
- Choose the Permissions tab.
- Verify the following managed policies are attached:
AmazonEKSWorkerNodePolicy
AmazonEC2ContainerRegistryPullOnly
Note: If you don't use IRSA or EKS Pod Identity for the VPC CNI, then you must also attach the AmazonEKS_CNI_Policy managed policy. However, it's a best practice to attach this policy to a separate role used specifically for the Amazon VPC CNI add-on.
For more information about creating the node IAM role, see Amazon EKS node IAM role.
Configure authentication for worker nodes
Choose one of the following methods to configure authentication:
Use Access Entries
Access Entries are the recommended method to grant worker nodes access to your cluster.
- Open the Amazon EKS console.
- In the navigation pane, choose Clusters.
- Select your cluster.
- Choose the Access tab.
- In the Access entries section, verify an entry exists for your worker node IAM role ARN.
- If no entry exists, choose Create access entry.
- For IAM principal ARN, enter your worker node IAM role ARN, not the instance profile ARN.
- For Type, choose EC2 Linux or EC2 Windows based on your node type.
- Choose Next, then choose Create.
For more information about Access Entries, see Grant IAM users access to Kubernetes with EKS access entries.
Use aws-auth ConfigMap
If your cluster doesn't support Access Entries, then use the aws-auth ConfigMap method.
For more information about configuring the aws-auth ConfigMap, see Grant IAM users access to Kubernetes with a ConfigMap.
Verify user data configuration
For self-managed nodes, verify the user data includes the correct cluster name and configuration.
The user data must include the bootstrap script with your cluster name:
#!/bin/bash
/etc/eks/bootstrap.sh my-cluster
Note: Replace my-cluster with your actual cluster name.
For more information about bootstrap script configuration, see Node bootstrapping.
Check network connectivity
Complete the following steps:
- Verify worker nodes can reach the cluster API server endpoint.
- For nodes in public subnets, confirm they have public IP addresses assigned.
- For nodes in private subnets, verify the subnet has a route to a NAT gateway.
- Verify security groups allow the following traffic:
Use Port 443 for cluster API communication
Use Port 10250 for kubelet communication
Use Port 53, TCP and UDP, for DNS resolution
For more information about security group requirements, see View Amazon EKS security group requirements for clusters.
Verify VPC endpoints for private clusters
If your cluster uses private endpoints, then verify the following VPC endpoints exist:
- com.amazonaws.region.ec2
- com.amazonaws.region.ecr.api
- com.amazonaws.region.ecr.dkr
- com.amazonaws.region.s3
- com.amazonaws.region.sts
For more information about private cluster requirements, see Deploy private clusters with limited internet access.
Check cluster role configuration
Complete the following steps:
- Open the IAM console.
- In the navigation pane, choose Roles.
- Search for your cluster IAM role.
- Select the role.
- Choose the Permissions tab.
- Verify the AmazonEKSClusterPolicy managed policy is attached.
- Choose the Trust relationships tab and verify that the trust policy contains the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Note: If the trust policy doesn't match the preceding policy, then copy the trust policy into the Trusted entities window.
Review kubelet logs
To review kubelet logs on a worker node:
-
Connect to your worker node using SSH or Session Manager.
-
Run the following command:
sudo journalctl -u kubelet -f
-
Look for error messages that indicate why the node can't join the cluster.
Verify AWS STS endpoint is activated
Confirm the AWS Security Token Service (AWS STS) endpoint for your AWS Region is activated for your account.
For more information about activating STS endpoints, see Activating and deactivating AWS STS in an AWS Region.
Check VPC and subnet tagging
Complete the following steps:
- Open the Amazon VPC console.
- In the navigation pane, choose Subnets.
- Select the subnet where your worker nodes are deployed.
- Choose the Tags tab.
- Verify the following tag exists:
For Key, kubernetes.io/cluster/my-cluster
For Value, shared or owned
Note: Replace my-cluster with your actual cluster name.
Related information
Troubleshoot problems with Amazon EKS clusters and nodes
VPC and Subnet Considerations