Nodes Unable to join Cluster

0

Hello everyone I am new to EKS and AWS in general and I have question as to why my Nodes are always unable to connect to the Cluster,

I have spent quite a few time debugging or atleast tried to debug because I have a specific use case that I would like to test.

The use case being, that I have a EKS cluster on version 1.28 and would like to deploy a Node Group using a launch template. Later I would like to test if I upgrade the Cluster to version 1.29 if it would also update the Worker Nodes automatically or not (I assume not but that is not the point of this question right now.).

So since I have no prior experience with Kubernetes or EKS I first tried to watch some Videos and reading articles to understand the concepts and fundamentals, I think I have that checked for now.

  1. First thing I did after that, I created a Cluster through the AWS Management Console on a default VPC that was already being used by other people (Mistake 1).

  2. Second try I tried to create a new VPC with 2 Subnets, 1 Routing Table and 1 Internet Gateway all done via the AWS Management Console. After that Cluster was deployed I tried to deploy the Node Group still using the Management Console and I think it had worked, I forgot to document that part.

  3. The next step would be to use the template, I created a launch template where I specified the Instance type t2.micro and the AMI Ubuntu 20.04 EKS 1.28 20240322, when trying to deploy it it would fail and say that the Instances failed to join the kubernetes Cluster.

I tried to look through some Documents and User Guides: https://docs.aws.amazon.com/eks/latest/userguide/setting-up.html

And found out that a lot of people create clusters and node groups using the eksctl CLI, so I tried it, I installed it in cloudshell and tried to use eksctl to create the node group and I had issues that it didn't know in which cluster and I didn't know how to tell it to use the existing Cluster (I think, it's been really confusing for me).

So after some more reading articles like and going back in forth with it: https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html https://eksctl.io/usage/launch-template-support/ https://eksctl.io/usage/nodegroup-managed/

even at one point thinking about doing it on terraform: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group#launch_template https://wangpp.medium.com/terraform-eks-nodegroups-with-custom-launch-templates-5b6a199947f

  1. I restarted from the beginning, this time only trying to do it with eksctl in cloudshell, I created the cluster with eksctl, which created 3 public subnets, 3 private subnets, 1 Internet GW and 1 NAT GW with the appropriate Security groups as far as I saw when I ran the command and check the Cloudformation Stack that it produced. After the cluster was created, I created it without a node group, because I want to learn how exactly it works so the following step I tried to use the template that I had created earlier but it didn't work so I went back to the documentations and found out that I needed to specify the right security group at the launch template lvl in order to be able to use eksctl create nodegroup --config-file eks-nodegroup.yaml

eks-nodegroup:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: my-cluster
  region: region-code
managedNodeGroups:
- name: my-mng
  launchTemplate:
    id: lt-id
    version: "1"

after that I deployed it and check the Stack that it created for creating a Node group and saw that he would automatically deploy the Node Groups in the public subnets, so it should be fine to communicate with the Control Plane Nodes right?

But no it still said that the Nodes are unable to connect to the cluster, so I destroyed the stack that created the node groups and tried to create a node group using

eksctl create nodegroup \
  --cluster my-cluster \
  --region region-code \
  --name my-mng \
  --node-ami-family ami-family \
  --node-type m5.large \
  --nodes 3 \
  --nodes-min 2 \
  --nodes-max 4 \
  --ssh-access \
  --ssh-public-key my-key

and that worked, the nodes could connect to the cluster and as of now are running fine, I checked the stack and tried to compare the launch template it had created with mine and saw that mine didn't had the correct security groups because I had created a few different versions, also saw that the tags were missing so I added them manually in the launch template, other than that I have not really found many differences.

I am trying to follow this article, but I don't understand the user data part if it applies to my use case or not and what I need to do exactly. https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html#launch-template-basics

  • If your nodes are unable to connect to the cluster, there could be several reasons for this. Check the security group rules, subnet configurations, and IAM permissions associated with your nodes. Also, ensure that the EKS cluster is accessible from the public internet and that there are no network issues blocking communication.

  • Hello, I tried to create a managed node without a launch template and it worked so I assume that I have an Issue with my launch template but idk what exactly because I tried to look at the one it generated on it's own and the only difference I see is the User Data

1 Answer
2
profile picture
EXPERT
answered 18 days ago
profile picture
EXPERT
Artem
reviewed 17 days ago
  • The issue is not at the cluster but rather at the Node groups level, where I am trying to use a template that I created myself, because I want to specifiy a specific AMI

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions