How do I resolve cluster creation errors in Amazon EKS?
I get service errors when I provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster using AWS CloudFormation or eksctl.
Consider the following troubleshooting options:
- You receive an error message stating that your targeted Availability Zone doesn't have sufficient capacity to support the cluster. Complete the steps in the Recreate the cluster in a different Availability Zone section.
- You receive an error message stating that resource creation failed. Complete the steps in the Confirm that you have the correct IAM permissions to create a cluster section, or the Monitor your Amazon VPC resources section.
- You receive an error message stating that the creation timed out when waiting for worker nodes. Complete the steps in the Confirm that your worker nodes can reach the control plane API endpoint section.
Recreate the cluster in a different Availability Zone
If you launch control plane instances in an Availability Zone with limited capacity, then you can receive an error that's similar to the following:
Cannot create cluster 'sample-cluster' because us-east-1d, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c
To resolve the preceding error, create the cluster again using the recommended Availability Zones from the error message.
If you're provisioning the cluster using CloudFormation, then in the Subnets parameter add subnet values that match the Availability Zones.
If you're using eksctl, then use the --zones flag to add the values for the different Availability Zones. For example:
$ eksctl create cluster 'sample-cluster' --zones us-east-1a,us-east-1b,us-east-1c
Note: Replace sample-cluster with your cluster name. Replace us-east-1a, us-east-1b, and us-east-1c with your Availability Zones.
Confirm that you have the correct IAM permissions to create a cluster
When you create a cluster, verify that you have the correct AWS Identity and Access Management (IAM) permissions. This includes correct policies for the Amazon EKS service IAM role.
You can use eksctl to create the prerequisite resources for your cluster, such as the IAM roles and security groups. The required minimum permissions depend on the eksctl configuration that you're launching. For more information, see troubleshooting solutions from the eksctl GitHub community.
If your cluster has issues with IAM permissions, then you can receive an error in eksctl that's similar to the following:
API: iam:CreateRole User: arn:aws:iam::your-account-id:user/your-user-name is not authorized to perform: iam:CreateRole on resource: arn:aws:iam::your-account-id:role/eksctl-newtest22-cluster-ServiceRole-10NXBYLSN4ULP
To resolve the preceding error, review the minimum IAM policies for running eksctl use cases on the eksctl website. Also, see Identity and Access Management for Amazon EKS, and How can I troubleshoot access denied or unauthorized operation errors with an IAM policy?
Monitor your Amazon VPC resources
When you create a cluster, eksctl creates a new Amazon Virtual Private Cloud (Amazon VPC) by default. If you don't want eksctl to create a new Amazon VPC, then you must specify your custom Amazon VPC and subnets in the configuration file.
If your cluster has issues with your Amazon VPC limits, then you can receive the following error message:
The maximum number of VPCs has been reached. (Service: AmazonEC2; Status Code: 400; Error Code: VpcLimitExceeded; Request ID: a12b34cd-567e-890-123f-ghi4j56k7lmn)
To resolve the preceding error, monitor your resources. For example, check the number of Amazon VPCs in your AWS Region or the internet gateways per Region where you create the cluster. For more information, see Amazon VPC quotas.
For issues related to resource constraints on the number of Amazon VPC resources in your Region, consider one of the following options:
(Option 1) Use an existing Amazon VPC to overcome resource constraints
Create a configuration file that specifies the Amazon VPC and subnets where you want to provision your cluster's worker nodes:
$ eksctl create cluster sample-cluster -f cluster.yaml
(Option 2) Request a service quota increase to overcome resource constraints
Request a service quota increase for the resources in the CloudFormation stack events of the cluster that eksctl provisioned.
Confirm that your worker nodes can reach the control plane API endpoint
When eksctl deploys your cluster, it waits for the launched worker nodes to join the cluster and reach Ready status. If your worker nodes don't reach the control plane or have an invalid IAM role, then you can receive the following error:
timed out (after 25m0s) waiting for at least 4 nodes to join the cluster and become ready in "eksfbots-ng1"
To resolve the preceding error, get your worker nodes to join the cluster, and confirm that your worker nodes are in Ready status.