내용으로 건너뛰기

Why did the worker nodes fail to join my Amazon EKS cluster?

10분 분량
0

I tried to join worker nodes to my Amazon Elastic Kubernetes Service (Amazon EKS) cluster. I received an error message, or the nodes didn't join the cluster.

Short description

When you try to join worker nodes to your Amazon EKS cluster, you might experience one of the following issues:

  • When you create a managed node group in the EKS cluster, the managed node group enters the Create failed state. The worker nodes don't join the EKS cluster, and you receive the error message, "Instances failed to join the kubernetes cluster".
  • The new worker nodes fail to join the EKS cluster when you upgrade the managed node group in the EKS cluster.

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

Use the Systems Manager automation runbook

Use the AWSSupport-TroubleshootEKSWorkerNode runbook to determine why your worker nodes aren't joining your cluster.

Important: For the automation to work, your worker nodes must have permission to access and run AWS Systems Manager. To grant permission, attach the AmazonSSMManagedInstanceCore AWS managed policy to the AWS Identity and Access Management (IAM) role for your Amazon Elastic Compute Cloud (Amazon EC2) instance profile. Use the AmazonSSMManagedInstanceCore policy as the default configuration for the Amazon EKS managed node groups that you create through eksctl. Use the [-a-zA-Z0-9]{1,100}$ format for your cluster name.

To run the automation, complete the following steps:

  1. On the Systems Manager console, open the AWSSupport-TroubleshootEKSWorkerNode runbook.
  2. Note: Review the Document details section of the runbook for more information about the runbook.
  3. Check that you set the AWS Region to the same Region as your cluster.
  4. In the Input parameters section, enter the name of your cluster in the ClusterName field and Amazon EC2 instance ID in the WorkerID field.
  5. (Optional) In the AutomationAssumeRole field, enter the Amazon Resource Name (ARN) of the IAM role that allows the automation to perform actions for you. If you don't specify an IAM role, then the automation uses the permissions of the user who starts the runbook.
  6. Choose Execute.
  7. Review the Outputs section to determine the cause of the issue and steps that you can take to resolve it.

Verify DNS support for your Amazon VPC

Confirm that you turned on the DNS hostnames and DNS resolution in the Amazon Virtual Private Cloud (Amazon VPC) for your EKS cluster.

Verify that your instance profile's worker nodes have the correct permissions

Attach the following AWS managed policies to the role that's associated with your instance profile's worker nodes: 

Make sure that the permissions boundary or service control policy (SCP) at the organization or account level or doesn't restrict the worker node to make API calls. 

Configure the user data for your worker nodes

Note: If you use AWS CloudFormation to launch your worker nodes, then you don't need to configure the user data for your worker nodes. Instead, use the CloudFormation console to launch self-managed Amazon Linux nodes.

If you use managed node groups to launch your worker node, then you don't need to configure user data with Amazon EKS optimized Amazon Linux Amazon Machine Images (AMIs). Configure the user data only when you use custom AMIs to launch your worker nodes through managed node groups.

If you use Amazon EKS managed node groups with a custom launch template, then specify the correct user data in the launch template. If the Amazon EKS cluster is a fully private cluster that uses VPC endpoints to connect, then you must update the user data. Specify the certificate authority (CA), API server endpoint, and DNS cluster IP address in the user data.

Example user data configuration:

#!/bin/bash
set -ex
B64_CLUSTER_CA=CA-CERT
API_SERVER_URL=ENDPOINT
K8S_CLUSTER_DNS_IP=IP-ADDRESS
/etc/eks/bootstrap.sh ${ClusterName} ${BootstrapArguments} —b64-cluster-ca $B64_CLUSTER_CA —apiserver-endpoint $API_SERVER_URL —dns-cluster-ip $K8S_CLUSTER_DNS_I

Note: Replace CA-CERT, ENDPOINT, and IP-ADDRESS with the values from your instance. Also, replace ${ClusterName} with the name of your EKS cluster and ${BootstrapArguments} with additional bootstrap values, if needed.

If you must provide user data to pass arguments to the bootstrap.sh file for the Amazon EKS optimized Linux/Bottlerocket AMI, then specify an AMI ID in your launch template's ImageField.

To configure user data for your worker nodes, specify the user data when you launch your EC2 instances.

For example, if you use a third-party tool such as Terraform, then update the user data field to launch your EKS worker nodes.

Example user data configuration:

#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh ${ClusterName} ${BootstrapArguments}

Note: Replace ${ClusterName} with the name of your EKS cluster and ${BootstrapArguments} with additional bootstrap values, if needed.

If you use an Amazon Linux 2023 AMI, then add the minimum required parameters to the user data in the following format:

MIME-Version: 1.0Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    apiServerEndpoint: https://example.com
    certificateAuthority: Y2VydGlmaWNhdGVBdXRob3JpdHk=
    cidr: 10.100.0.0/16
    name: my-cluster

--//--

Verify that you correctly configured the networking for your Amazon VPC subnets and your worker nodes are in same Amazon VPC as your EKS cluster

If you use an internet gateway, then confirm that you correctly attached it to the route table.

If you use a NAT gateway, then make sure that you correctly configured it in a public subnet. Also, verify that you correctly configured the route table.

If you use VPC private endpoints for a fully private cluster, then confirm that you have the following interface endpoints:

  • com.amazonaws.region.ec2
  • com.amazonaws.region.ecr.api
  • com.amazonaws.region.ecr.dkr
  • com.amazonaws.region.sts

Also, make sure that you have the gateway endpoint, com.amazonaws.region.s3.

You can restrict the Amazon Simple Storage Service (Amazon S3) gateway VPC endpoint policy for Amazon ECR. For more information, see Minimum Amazon S3 bucket permissions for Amazon ECR.

Pods that you configure with IAM roles for service accounts get credentials from an AWS Security Token Service (AWS STS) API call. If there's no outbound internet access, then you must create and use an AWS STS VPC endpoint in your VPC.

The security group for the VPC endpoint must have an inbound rule that allows traffic from port 443. For more information, see Control traffic to your AWS resources using security groups.

Make sure that the policy that's attached to the VPC endpoint has the required permissions to make API calls to the specific service.

In the Networking section of the EKS cluster, identify the subnets that are associated with your cluster. Confirm that they belong to the same VPC.

You must create separate endpoints for each AWS service that you use. For a list of endpoints for common AWS services, see the table in Pod requirements. You can also create an endpoint service based on your use case.

Also, you can configure different subnets to launch your worker nodes in. The subnets must exist in the same Amazon VPC, and you must appropriately tag them. Amazon EKS automatically manages tags only for subnets that you configure during cluster creation. For more information, see Subnet requirements and considerations.

Update the aws-auth ConfigMap with the NodeInstanceRole of your worker nodes

Verify that you correctly configured the aws-auth ConfigMap with your worker node's IAM role and not the instance profile.

Run the following command:

kubectl describe configmap -n kube-system aws-auth

If you didn't correctly configure the aws-auth ConfigMap, then you get the following error message:

"571 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458 : Failed to list *v1.Node: Unauthorized"

If you use the EKS API authentication method, then create an access entry for the NodeInstanceRole. For Type, select EC2_linux.

Meet the security group requirements of your worker nodes

Confirm that you configured your control plane's security group and worker node security group with required settings for inbound and outbound traffic. Also, confirm that you configured your network access control list (network ACL) rules to allow traffic to and from 0.0.0.0/0 for ports 80, 443, and 1025-65535.

Set the tags for your worker nodes

For the Tag property of your worker nodes, set Key to kubernetes.io/cluster/clusterName and set Value to owned.

For more information, see VPC requirements and considerations.

Check whether your worker nodes can reach the API server endpoint for your EKS cluster

Launch worker nodes in a subnet that's associated with a route table that routes to the API endpoint through a NAT or internet gateway. If you launch your worker nodes in a restricted private network, then confirm that your worker nodes can reach the EKS API server endpoint. If you launch worker nodes with an Amazon VPC that uses a custom DNS instead of AmazonProvidedDNS, then the worker nodes might not resolve the endpoint.

Note: The endpoint is unresolved when you deactivate public access to the endpoint, and activate only private access. For more information, see Enabling DNS resolution for Amazon EKS cluster endpoints.

Confirm whether kubelet can reach the required endpoints

Run the following commands to test whether kubelet can reach the endpoints:

$ nc -vz ec2.region.amazonaws.com 443
$ nc -vz dkr.ecr.region.amazonaws.com 443
$ nc -vz api.ecr.region.amazonaws.com 443
$ nc -vz s3.region.amazonaws.com 443

Note: Replace region with your Region.

Confirm that you correctly configured the cluster role

You must attach AmazonEKSClusterPolicy to your Amazon EKS cluster IAM role. Also, the trust relationship of your cluster must allow the eks.amazonaws.com service for sts:AssumeRole.

Example trust policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Confirm that you activated Regional STS endpoints

If the cluster is in a Region that supports STS endpoints, then activate the Regional STS endpoint to authenticate the kubelet. The kubelet can then create the node object.

Make sure that you configured your AMI to work with EKS and the AMI includes the required components

The Amazon EKS optimized Amazon Linux AMI contains the required components to work with the EKS cluster. If the worker nodes' AMI isn't the Amazon EKS optimized Amazon Linux AMI, then confirm that the following Kubernetes components are in the Active state:

  • kubelet
  • AWS IAM Authenticator
  • Docker (Amazon EKS version 1.23 and earlier)
  • containerd

Use SSH To connect to your EKS worker node instance and check kubelet agent logs

Check that you configured the kubelet agent as a systemd service in the EKS worker node instance.

To validate your kubelet logs, run the following command:

journalctl -f -u kubelet

To resolve issues, see Troubleshoot problems with Amazon EKS clusters and nodes.

Use the Amazon EKS log collector script to troubleshoot errors

Use the log files and operating system (OS) logs to troubleshoot issues in your Amazon EKS cluster. Amazon EKS cluster worker nodes store cloud-init initialization logs in /var/log/cloud-init-output.log and /var/log/cloud-init.log.

To use the EKS logs collector script to collect logs, you must use SSH to connect to the worker node that has the issue. Then, run the following script:

curl -O https://raw.githubusercontent.com/awslabs/amazon-eks-ami/master/log-collector-script/linux/eks-log-collector.sh

sudo bash eks-log-collector.sh

Confirm that the Amazon VPC subnets for the worker node have available IP addresses

If your Amazon VPC doesn't have available IP addresses, then you can associate a secondary CIDR to your existing Amazon VPC. For more information, see View Amazon EKS networking requirements for VPC and subnets. The EKS optimized AMI contains required components to work with EKS cluster.

AWS 공식업데이트됨 2달 전
댓글 없음