- Newest
- Most votes
- Most comments
Thanks for your reply. AWSSupport-TroubleshootEKSWorkerNode succeeds with warning: "No secondary private IP addresses are assigned to worker node i-XYZ, ensure that the CNI plugin is running properly."
My aws-node-pods have a failing Liveness and Readyness probe. I've extended the timeout from 5s to 10s in the daemonset. That didn't fix.
Warning Unhealthy 42s kubelet Readiness probe failed: {"level":"info","ts":"2024-05-30T10:16:56.989Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 10s"}
Warning Unhealthy 21s kubelet Readiness probe failed: {"level":"info","ts":"2024-05-30T10:17:17.088Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 10s"}
Warning Unhealthy 11s (x6 over 82s) kubelet Readiness probe failed: command "/app/grpc-health-probe -addr=:50051 -connect-timeout=10s -rpc-timeout=10s" timed out
Warning Unhealthy 4s (x2 over 14s) kubelet Liveness probe failed: command "/app/grpc-health-probe -addr=:50051 -connect-timeout=10s -rpc-timeout=10s" timed out
Warning Unhealthy 1s kubelet Readiness probe failed: {"level":"info","ts":"2024-05-30T10:17:37.185Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 10s"}
my ipamd.log cat /var/log/aws-routed-eni/ipamd.log
says:
{"level":"info","ts":"2024-05-30T10:22:14.629Z","caller":"aws-k8s-agent/main.go:42","msg":"Starting L-IPAMD ..."}
{"level":"info","ts":"2024-05-30T10:22:14.629Z","caller":"aws-k8s-agent/main.go:53","msg":"Testing communication with server"}
{"level":"error","ts":"2024-05-30T10:22:19.629Z","caller":"wait/loop.go:53","msg":"Unable to reach API Server, Get \"https://172.20.0.1:443/version?timeout=5s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"}
{"level":"error","ts":"2024-05-30T10:22:24.630Z","caller":"wait/loop.go:87","msg":"Unable to reach API Server, Get \"https://172.20.0.1:443/version?timeout=5s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"}
My subnets have more that 8.000 ips available.
Checking node access to API endpoint
nc -vz XYZ.gr7.eu-central-1.eks.amazonaws.com
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection to XYZ failed: Connection timed out.
Ncat: Trying next address...
Ncat: Connection timed out.
The node has internet access. It can eg ping 8.8.8.8.
The API server endpoint access is set to public the allowlist is set 0.0.0.0/0. My node is in a private subnet. The subnet has got a route to a NAT Gateway that has a primary public ip assigned and is active.
What else can cause my node to be able to access the API server endpoint?
Hello,
I can confirm that Amazon VPC CNI v1.18.1-eksbuild.3 is compatible with EKS version 1.25 https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html
I would encourage you follow this troubleshooting guide to identity why the readiness checks are failing for this pod
- https://github.com/aws/amazon-vpc-cni-k8s/issues/1038
- https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md
Additionally you can also run SSM automation(AWSSupport-TroubleshootEKSWorkerNode) to help you identify and troubleshoot common causes that prevent worker nodes from joining a cluster.
Important: For the automation to work, your worker nodes must have permission to access Systems Manager and have Systems Manager running. To grant permission, attach the AmazonSSMManagedInstanceCore AWS managed policy to the IAM role that corresponds to your EC2 instance profile. This is the default configuration for EKS managed node groups that are created through eksctl.
References:
Thanks for your reply.
- Ensure Amazon EKS security group allows outbound traffic to port 443 (HTTPS) and it allows inbound traffic on port 443 from the security group associated with your nodes.
I'm using the same security group for EKS and for the nodes. Outbound is opened for IPv4 all traffic, all protocols, all ports for destination 0.0.0.0/0 Inbound there is one rule for the same security group with all protocols, all ports. This doesn't seem to be the cause for my issue.
-
Verify that the network ACLs (NACLs) associated with the subnets where your nodes and API server reside allow the necessary traffic. For my VPC there is one network acl that has one rule for inbound that is widely open, same for outbound. This doesn't seem to be the cause for my issue.
-
Check DNS Resolution for API Endpoint and check that this resolves to the correct IP address. nslookup for the API endpoint returns 2 ips This doesn't seem to be the cause for my issue.
-
nc -vz <resolved_ip> 443
nc -vz XYZ.gr7.eu-central-1.eks.amazonaws.com
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection to XYZ failed: Connection timed out.
Ncat: Trying next address...
Ncat: Connection timed out.
nc is still not successful.
Any other idea is highly appreciated.
Relevant content
- asked 4 months ago
- AWS OFFICIALUpdated 2 months ago
Hi,
Hope this helps to troubleshoot. Let me know I would be happy to assist.