Skip to content

worker nodes can't register to wprker plans

0

I am working with eks self managed worker nodes, the control plans are managed by aws, the ENIs are in 2 AZ in Paris in "wavelength_vpc", and the worker nodes are self managed in Casablanca wavelenght in "wavelength_vpc", starting up with userdata using launch template and autoscaling group, the cluster endpoint is both public and private. When I run the command "kubectl get nodes" I got "No resources found", based on my knowledge, this is because the worker nodes are not registering in control plans, and this can be because of (as I know) : 1- aws-auth doesn't have the same role the worker nodes have (which is not the case for me, here's my aws-auth : provider "kubernetes" { host = var.cluster_endpoint token = data.aws_eks_cluster_auth.cluster.token cluster_ca_certificate = base64decode(var.cluster_ca_certificate) }

data "aws_eks_cluster_auth" "cluster" { name = var.cluster_name }

resource "kubernetes_config_map" "aws_auth" { metadata { name = "aws-auth" namespace = "kube-system" }

data = { mapRoles = yamlencode([ for arn in var.worker_roles : { rolearn = arn username = "system:node:{{EC2PrivateDNSName}}" groups = [ "system:bootstrappers", "system:nodes" ] } ]) } } and when I call it in the root main, I pass the worker node role : module "aws_auth" { source = "../../modules/aws-auth" cluster_name = module.eks.cluster_name cluster_endpoint = module.eks.cluster_endpoint cluster_ca_certificate = module.eks.certificate_authority worker_roles = [module.eks_node_group_iam_role.role_arn] } ) 2- userdata in launch template may not have the appropriate code (here's mine, I think it is good, except the api server endpoint, because I am using cluster_endpoint, so I am using the public endpoint, and I think I should use the private endpoint, because the worker nodes and the ENIs are in the same vpc ) : #!/bin/bash set -ex echo "Starting EKS bootstrap..." >> /var/log/custom-user-data.log /etc/eks/bootstrap.sh ${cluster_name} --apiserver-endpoint "${cluster_endpoint}" --b64-cluster-ca "${cluster_ca}" --dns-cluster-ip "${dns_ip}" --container-runtime containerd --kubelet-extra-args "--max-pods=${max_pods}" --use-max-pods false echo "Bootstrap done." >> /var/log/custom-user-data.log besides, the code is displaying only "Starting EKS bootstrap..." not "Bootstrap done." 3- security group : I should allow ingress 443 because it's where the control plan talks to the worker nodes, and egress to control plan in port 443 ... here's my security group : resource "aws_security_group" "worker_nodes" { name = "${var.cluster_name}-worker-nodes-sg" description = "Security group for EKS worker nodes" vpc_id = var.vpc_id

ingress { description = "Allow node-to-node communication" from_port = 0 to_port = 0 protocol = "-1" self = true }

ingress { description = "Allow EKS control plane to communicate with nodes" from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] # change it after to the control plans subnets }

ingress { description = "Control plane to kubelet" from_port = 10250 to_port = 10250 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] }

egress { description = "Allow outbound HTTPS traffic to EKS control plane" from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks =["0.0.0.0/0"] }

tags = { Name = "${var.cluster_name}-worker-sg" } } where the problem might be coming from ? when I run "sudo journalctl -u kubelet -f" I got [ec2-user@ip-10-0-1-192 ~]$ sudo journalctl -u kubelet -f -- Logs begin at Tue 2025-08-05 22:15:22 UTC. -- Aug 05 22:21:43 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:21:43.571856 3118 kubelet_node_status.go:96] "Unable to register node with API server" err="Post "https://4FEBAFF2CF63722795990A047628F73D.gr7.eu-west-3.eks.amazonaws.com/api/v1/nodes\": dial tcp 10.0.3.150:443: i/o timeout" node="ip-10-0-1-192.eu-west-3.compute.internal" Aug 05 22:21:46 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:21:46.535207 3118 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node "ip-10-0-1-192.eu-west-3.compute.internal" not found" Aug 05 22:21:50 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:21:50.445790 3118 event.go:368] "Unable to write event (may retry after sleeping)" err="Post "https://4FEBAFF2CF63722795990A047628F73D.gr7.eu-west-3.eks.amazonaws.com/api/v1/namespaces/default/events\": dial tcp 10.0.2.215:443: i/o timeout" event="&Event{ObjectMeta:{ip-10-0-1-192.eu-west-3.compute.internal.1858fedc1e1a6fcf default 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Node,Namespace:,Name:ip-10-0-1-192.eu-west-3.compute.internal,UID:ip-10-0-1-192.eu-west-3.compute.internal,APIVersion:,ResourceVersion:,FieldPath:,},Reason:Starting,Message:Starting kubelet.,Source:EventSource{Component:kubelet,Host:ip-10-0-1-192.eu-west-3.compute.internal,},FirstTimestamp:2025-08-05 22:17:56.212117455 +0000 UTC m=+0.409318920,LastTimestamp:2025-08-05 22:17:56.212117455 +0000 UTC m=+0.409318920,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:kubelet,ReportingInstance:ip-10-0-1-192.eu-west-3.compute.internal,}" Aug 05 22:21:50 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:21:50.573044 3118 kubelet_node_status.go:356] "Setting node annotation to enable volume controller attach/detach" Aug 05 22:21:50 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:21:50.574106 3118 kubelet_node_status.go:684] "Recording event message for node" node="ip-10-0-1-192.eu-west-3.compute.internal" event="NodeHasSufficientMemory" Aug 05 22:21:50 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:21:50.574142 3118 kubelet_node_status.go:684] "Recording event message for node" node="ip-10-0-1-192.eu-west-3.compute.internal" event="NodeHasNoDiskPressure" Aug 05 22:21:50 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:21:50.574157 3118 kubelet_node_status.go:684] "Recording event message for node" node="ip-10-0-1-192.eu-west-3.compute.internal" event="NodeHasSufficientPID" Aug 05 22:21:50 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:21:50.574193 3118 kubelet_node_status.go:73] "Attempting to register node" node="ip-10-0-1-192.eu-west-3.compute.internal" Aug 05 22:21:51 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: W0805 22:21:51.876562 3118 transport.go:356] Unable to cancel request for *otelhttp.Transport Aug 05 22:21:51 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:21:51.876648 3118 controller.go:145] "Failed to ensure lease exists, will retry" err="Get "https://4FEBAFF2CF63722795990A047628F73D.gr7.eu-west-3.eks.amazonaws.com/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-1-192.eu-west-3.compute.internal?timeout=10s\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" interval="7s" Aug 05 22:21:56 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:21:56.536971 3118 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node "ip-10-0-1-192.eu-west-3.compute.internal" not found" Aug 05 22:22:06 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:22:06.537962 3118 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node "ip-10-0-1-192.eu-west-3.compute.internal" not found" Aug 05 22:22:07 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:22:07.956675 3118 csi_plugin.go:892] Failed to contact API server when waiting for CSINode publishing: Get "https://4FEBAFF2CF63722795990A047628F73D.gr7.eu-west-3.eks.amazonaws.com/apis/storage.k8s.io/v1/csinodes/ip-10-0-1-192.eu-west-3.compute.internal?resourceVersion=0": dial tcp 10.0.2.215:443: i/o timeout Aug 05 22:22:08 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: W0805 22:22:08.878048 3118 transport.go:356] Unable to cancel request for *otelhttp.Transport Aug 05 22:22:08 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:22:08.878250 3118 controller.go:145] "Failed to ensure lease exists, will retry" err="Get "https://4FEBAFF2CF63722795990A047628F73D.gr7.eu-west-3.eks.amazonaws.com/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-1-192.eu-west-3.compute.internal?timeout=10s\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" interval="7s" Aug 05 22:22:16 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:22:16.539115 3118 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node "ip-10-0-1-192.eu-west-3.compute.internal" not found" Aug 05 22:22:20 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:22:20.575660 3118 kubelet_node_status.go:96] "Unable to register node with API server" err="Post "https://4FEBAFF2CF63722795990A047628F73D.gr7.eu-west-3.eks.amazonaws.com/api/v1/nodes\": dial tcp 10.0.2.215:443: i/o timeout" node="ip-10-0-1-192.eu-west-3.compute.internal" Aug 05 22:22:25 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: W0805 22:22:25.880269 3118 transport.go:356] Unable to cancel request for *otelhttp.Transport Aug 05 22:22:25 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:22:25.880464 3118 controller.go:145] "Failed to ensure lease exists, will retry" err="Get "https://4FEBAFF2CF63722795990A047628F73D.gr7.eu-west-3.eks.amazonaws.com/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-1-192.eu-west-3.compute.internal?timeout=10s\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" interval="7s" Aug 05 22:22:26 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:22:26.540656 3118 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node "ip-10-0-1-192.eu-west-3.compute.internal" not found" Aug 05 22:22:27 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:22:27.575934 3118 kubelet_node_status.go:356] "Setting node annotation to enable volume controller attach/detach" Aug 05 22:22:27 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:22:27.576852 3118 kubelet_node_status.go:684] "Recording event message for node" node="ip-10-0-1-192.eu-west-3.compute.internal" event="NodeHasSufficientMemory" Aug 05 22:22:27 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:22:27.576901 3118 kubelet_node_status.go:684] "Recording event message for node" node="ip-10-0-1-192.eu-west-3.compute.internal" event="NodeHasNoDiskPressure" Aug 05 22:22:27 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:22:27.576912 3118 kubelet_node_status.go:684] "Recording event message for node" node="ip-10-0-1-192.eu-west-3.compute.internal" event="NodeHasSufficientPID" Aug 05 22:22:27 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:22:27.576937 3118 kubelet_node_status.go:73] "Attempting to register node" node="ip-10-0-1-192.eu-west-3.compute.internal" Aug 05 22:22:30 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:22:30.447393 3118 event.go:368] "Unable to write event (may retry after sleeping)" err="Post "https://4FEBAFF2CF63722795990A047628F73D.gr7.eu-west-3.eks.amazonaws.com/api/v1/namespaces/default/events\": dial tcp 10.0.2.215:443: i/o timeout" event="&Event{ObjectMeta:{ip-10-0-1-192.eu-west-3.compute.internal.1858fedc1e1a6fcf default 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Node,Namespace:,Name:ip-10-0-1-192.eu-west-3.compute.internal,UID:ip-10-0-1-192.eu-west-3.compute.internal,APIVersion:,ResourceVersion:,FieldPath:,},Reason:Starting,Message:Starting kubelet.,Source:EventSource{Component:kubelet,Host:ip-10-0-1-192.eu-west-3.compute.internal,},FirstTimestamp:2025-08-05 22:17:56.212117455 +0000 UTC m=+0.409318920,LastTimestamp:2025-08-05 22:17:56.212117455 +0000 UTC m=+0.409318920,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:kubelet,ReportingInstance:ip-10-0-1-192.eu-west-3.compute.internal,}" Aug 05 22:22:36 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:22:36.541889 3118 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node "ip-10-0-1-192.eu-west-3.compute.internal" not found" Aug 05 22:22:38 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:22:38.957169 3118 csi_plugin.go:892] Failed to contact API server when waiting for CSINode publishing: Get "https://4FEBAFF2CF63722795990A047628F73D.gr7.eu-west-3.eks.amazonaws.com/apis/storage.k8s.io/v1/csinodes/ip-10-0-1-192.eu-west-3.compute.internal?resourceVersion=0": dial tcp 10.0.2.215:443: i/o timeout Aug 05 22:22:42 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: W0805 22:22:42.881370 3118 transport.go:356] Unable to cancel request for *otelhttp.Transport Aug 05 22:22:42 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:22:42.881521 3118 controller.go:145] "Failed to ensure lease exists, will retry" err="Get "https://4FEBAFF2CF63722795990A047628F73D.gr7.eu-west-3.eks.amazonaws.com/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-1-192.eu-west-3.compute.internal?timeout=10s\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" interval="7s" Aug 05 22:22:46 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: E0805 22:22:46.542037 3118 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node "ip-10-0-1-192.eu-west-3.compute.internal" not found" Aug 05 22:22:48 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: W0805 22:22:48.342774 3118 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.Service: Get "https://4FEBAFF2CF63722795990A047628F73D.gr7.eu-west-3.eks.amazonaws.com/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.0.2.215:443: i/o timeout Aug 05 22:22:48 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: I0805 22:22:48.342915 3118 trace.go:236] Trace[2140293762]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:160 (05-Aug-2025 22:22:18.342) (total time: 30000ms): Aug 05 22:22:48 ip-10-0-1-192.eu-west-3.compute.internal kubelet[3118]: Trace[2140293762]: ---"Objects listed" error:Get "https://4FEBAFF2CF63722795990A047628F73D.gr7...

asked 10 months ago139 views
2 Answers
0
Accepted Answer

Hi there,

Let’s begin with the nature of AWS Wavelength Zones, which are fundamentally different from traditional Availability Zones (AZs). Unlike AZs, Wavelength Zones are not interconnected. They follow a hub-and-spoke model, connecting back to a single parent AWS Region but not to each other.

Now, a few key limitations and architectural considerations you should keep in mind when using Amazon EKS with AWS Wavelength:

EKS Control Plane Placement

  • Amazon EKS requires two subnets for the Control Plane.
  • Wavelength Zones are not supported for this purpose — the Control Plane must reside in standard AZ subnets.
  • Also note: Local EKS clusters (similar to those supported on AWS Outposts) are not available on Wavelength.
    Reference: AWS Docs – Local Clusters on EKS Outposts

Worker Node Configuration

  • You mentioned using self-managed nodes, which is good, since managed node groups and AWS Fargate are currently unsupported on AWS Wavelength.
  • Ensure your nodes use the EKS-optimized AMI and that the kubelet is properly configured to connect to the EKS Control Plane in the parent region.

Cross-Zone DNS Resolution Issues

If your worker nodes span multiple Wavelength Zones (e.g., Paris and Casablanca), you may encounter DNS resolution failures due to how traffic is isolated:

  • A CoreDNS pod may be scheduled in Wavelength Zone A (e.g., Casablanca).
  • A workload in Wavelength Zone B (e.g., Paris) might try to resolve DNS via that pod.
  • This traffic is blocked, as Wavelength Zones cannot communicate directly.
  • The default CoreDNS failover might not resolve this cleanly, and DNS queries could fail silently.

So, try with this first:

  • Pinning CoreDNS pods to standard AZs or specific zones using nodeAffinity or topologySpreadConstraints.
  • Running a local DNS cache (e.g., dnsmasq) within each Wavelength Zone to reduce reliance on remote CoreDNS pods.
answered 10 months ago
  • Thank you Mr Cesar for your feedback, my problem was because I was not allowing the port 443 on ENIs security group, thinking aws will open it for me. Thanks again for your time

0

Based on the logs and information provided, your worker nodes are failing to register with the EKS control plane due to network connectivity issues. The key error in your logs is: "dial tcp 10.0.3.150:443: i/o timeout" and similar timeout errors when trying to reach the API server.

The main issues appear to be:

  1. Network Connectivity Issues: Your worker nodes in Casablanca wavelength zone cannot communicate with the control plane in Paris. The logs show timeout errors when trying to connect to the EKS API server. This is likely due to the cross-region nature of your setup (Paris to Casablanca).

  2. Bootstrap Script Completion: Your bootstrap script is only logging "Starting EKS bootstrap..." but not "Bootstrap done." This indicates the bootstrap process is failing to complete.

Here are some solutions to try:

  1. Network Configuration:
  • Ensure proper routing between your Wavelength zones in Casablanca and the EKS control plane in Paris
  • Check that your VPC route tables allow traffic between these regions
  • Verify that any NAT gateways or transit gateways are properly configured for cross-region communication
  1. Endpoint Configuration:
  • Since you're using both public and private endpoints, try specifying the private endpoint in your bootstrap script if the nodes and control plane are in the same VPC
  • The bootstrap script should use the appropriate endpoint that's reachable from your worker nodes
  1. Security Group Configuration:
  • Your security group configuration looks good with allowing ingress on port 443 and 10250, but ensure that the CIDR blocks are correctly specified to allow traffic from the control plane subnets specifically rather than 0.0.0.0/0
  1. Access Entry for Node IAM Role:
  • For newer EKS versions, you might need to create an access entry for your node IAM role using:
aws eks create-access-entry --cluster-name your-cluster-name --principal-arn your-node-role-arn --type EC2_LINUX
  1. AMI Compatibility:
  • Ensure you're using an EKS-optimized AMI that's compatible with your cluster version

The most likely issue is the network connectivity between your Wavelength zones in Casablanca and the control plane in Paris. You should focus on troubleshooting the network path between these locations, checking route tables, and ensuring that the necessary ports are open in both directions.
Sources
EKS node creation failed | AWS re:Post
Troubleshoot problems with Amazon EKS clusters and nodes - Amazon EKS
EKS node failed to register to cluster | AWS re:Post

answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.