What could be causing EKS nodes to fail to join a cluster on a specific account?

0

I'm running into an unusual issue where, on one specific new AWS account, I cannot create any nodegroups whatsoever. I've tried on two other AWS accounts and I can make nodegroups without any problems.

The nodegroup creation always fails with something like the following:

NodeCreationFailure

Instances failed to join the kubernetes cluster

DUMMY_2f2298a2-f492-439a-b7bb-ff931c539d78 DUMMY_5651ecbb-690e-4f3e-bc28-c52dc0d95bca DUMMY_6db1e73c-a1c7-4258-b10d-f6994864c3ef DUMMY_93f8d481-afd5-4811-ae28-aa2c50bd3ef5 DUMMY_950c3c89-d7ef-489d-8023-bc88a3b8a99c DUMMY_a5e09b94-4c86-4d0b-bb12-b9630ee544de DUMMY_bab43e87-11f8-4747-908a-06ae3741c612 DUMMY_c3f7c48a-4138-48d4-ba15-894a33f2d90a DUMMY_cccca0c7-98ae-4bf7-8441-8124971e8a78 DUMMY_d9909a43-ebf5-4340-99f0-47281499b2e2 DUMMY_daa1703a-8032-4fa5-9eae-c8a0b04fc1dd DUMMY_f3d0c7e8-b265-4927-98d4-33f7d4cd5ace

This occurs whether I'm using eksctl to create a new cluster from scratch with nodegroups (both when I specify how the nodegroup should be configured, or if I let it use the defaults for the initial nodegroup), if I use eksctl to create a nodegroup on an existing cluster, or if I try to create a nodegroup on an existing cluster using the AWS web client. I've tried all of these things on different accounts and had success every time. I've tried both us-west-1 and us-west-2 and had no success on the affected account, and nothing but success on the other accounts.

I looked up common sources of this issue (https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html) and I haven't had success with trying the suggested solutions. The IAM roles that are created with each nodegroup (before they're deleted when the creation fails) look identical to ones on working accounts, and they have the permissions AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly, and AmazonEKS_CNI_Policy. I even tried making an IAM role with those three permissions and using that to make a nodegroup through the web client, and it still failed.

The VPCs that these clusters are on are configured for IPV4, not IPV6. The VPCs's main security groups have all outbound traffic allowed, and since they've been set up via eksctl, they have two public and two private subnets, with the public subnets having IP addresses auto-assigned, so they should have public internet access. The managed nodegroups created when I spin up a new cluster with eksctl seem to only be trying to use the public subnets, so they should definitely have public access.

The account I'm using has AdministratorAccess permission to the account.

I'm running out of ideas as to how to solve this. It really seems to be tied to this account, but I can't figure out what's causing this very specific problem.

asked 2 years ago217 views
1 Answer
0

Questions: Is this an existing VPC or are you letting eksctl create VPC? Sounds like an existing VPC since you mentioned you checked the SG, could their be NACL applied to VPC that would block outbound access? (did this to myself once, was doing some testing and forgot to remove them, applied at subnet level) eks endpoint is public by default so do you have necessary routing? (IGW, NATGW, etc) Is this account part of an organization with some policy in place to limit either EKS or EC2 creation?

Could also use the troubleshooting runbook to see if that sheds any light https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html

AWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions