Worker Node group doesn't join the EKS cluster

0

I have followed this blog to set up open5GS on AWS: https://aws.amazon.com/blogs/opensource/open-source-mobile-core-network-implementation-on-amazon-elastic-kubernetes-service/

  1. I've set up the infrastructure using open5gs-infra.yaml
  2. I've configured the bastion host and run step 5 properly (by providing the correct ARN value)
  3. I've initialised the DocumentDB
  4. I updated the CoreDNS configmap and restarted coredns pods
  5. I then ran the cloudformation yaml file for the creation of the worker node group
  6. However, the workernode group doesn't join the cluster. I've double-checked the parameters that I feed to the cloudformation template. I've even tried to edit the authConfig manually after the worker node group has been created so that the worker nodes can join the cluster. But that doesn't work.

Since there are no worker nodes, the pods can't be scheduled and the cluster is non-usable. What can I do so that the worker node group joins the cluster?

2 Answers
1

Hello there, thank you for providing the details.

There can be several reasons why a worker node cannot join EKS Cluster. For example,

In the VPC for your cluster, the configuration parameter domain-name-servers should be set to AmazonProvidedDNS.

If you are using Public Subnets, your subnet's configuration of "Auto-assign public IP" must be enabled.

The Worker node security group should be configured to talk to control plane. etc.

I would encourage you to check this document - https://aws.amazon.com/premiumsupport/knowledge-center/eks-worker-nodes-cluster/

If you still have issues, please reach out to AWS Premium Support. Thank you.

RaghavK
answered 2 months ago
0

Thank you for your response! I'll contact AWS premium support for further diagnosis.

  1. I've used this script to troubleshoot the issue: https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html

  2. I got one error. The security group policies applied to the cluster were highly restrictive. It was not allowing traffic to flow from the worker nodes to the cluster. This was the only error. All the other tests passed.

  3. I modified the security group to allow all inbound traffic from everywhere. I re-ran the script and the error was fixed. I then redeployed my worker node group but somehow they still didn't join the cluster.

  4. I used network path analyser in AWS VPC. I tried to test 3 paths: a. user_plane worker node as the source, control_plane worker node as the destination b. control_plane worker node as the source, bastion host as the destination c. user_plane worker node as the source, bastion host as the destination All the 3 paths are functional

  5. I checked the logs of the lambda function that is responsible for joining the worker nodes to the cluster but didn’t find any error!

answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions