How do I resolve a failed health check for a load balancer in Amazon EKS?
My load balancer keeps failing the health check in my Amazon Elastic Kubernetes Service (Amazon EKS).
Short description
To troubleshoot health check issues with the load balancer in your Amazon EKS, complete the steps in the following sections:
- Check the status of the pod
- Check the pod and service label selectors
- Check for missing endpoints
- Check the service traffic policy and cluster security groups for Application Load Balancers
- Verify that your EKS is configured for targetPort
- Verify that your AWS Load Balancer Controller has the correct permissions
- Check the ingress annotations for issues with Application Load Balancers
- Check the Kubernetes Service annotations for issues with Network Load Balancers
- Manually test a health check
- Check the networking
- Restart the kube-proxy
Resolution
Check the status of the pod
Check if the pod is in Running status and the containers in the pods are ready:
$ kubectl get pod -n YOUR_NAMESPACE
Note: Replace YOUR_NAMESPACE with your Kubernetes namespace.
Example output:
NAME READY STATUS RESTARTS AGE podname 1/1 Running 0 16s
Note: If the application container in the pod isn't running, then the load balancer health check isn't answered and fails.
Check the pod and service label selectors
For pod labels, run the following command:
$ kubectl get pod -n YOUR_NAMESPACE --show-labels
Example output:
NAME READY STATUS RESTARTS AGE LABELS alb-instance-6cc5cd9b9-prnxw 1/1 Running 0 2d19h app=alb-instance,pod-template-hash=6cc5cd9b9
To verify that your Kubernetes Service is using the pod labels, run the following command to check that its output matches the pod labels:
$ kubectl get svc SERVICE_NAME -n YOUR_NAMESPACE -o=jsonpath='{.spec.selector}{"\n"}'
Note: Replace SERVICE_NAME with your Kubernetes Service and YOUR_NAMESPACE with your Kubernetes namespace.
Example output:
{"app":"alb-instance"}
Check for missing endpoints
The Kubernetes controller for the service selector continuously scans for pods that match its selector, and then posts updates to an endpoint object. If you selected an incorrect label, then no endpoint appears.
Run the following command:
$ kubectl describe svc SERVICE_NAME -n YOUR_NAMESPACE
Example output:
Name: alb-instance Namespace: default Labels: <none> Annotations: <none> Selector: app=alb-instance-1 Type: NodePort IP Family Policy: SingleStack IP Families: IPv4 IP: 10.100.44.151 IPs: 10.100.44.151 Port: http 80/TCP TargetPort: 80/TCP NodePort: http 32663/TCP Endpoints: <none> Session Affinity: None External Traffic Policy: Cluster Events: <none>
Check if the endpoint is missing:
$ kubectl get endpoints SERVICE_NAME -n YOUR_NAMESPACE
Example output:
NAME ENDPOINTS AGE alb-instance <none> 2d20h
Check the service traffic policy and cluster security groups for issues with Application Load Balancers
Unhealthy targets in the Application Load Balancer target groups happen for two reasons. Either the service traffic policy, spec.externalTrafficPolicy, is set to Local instead of Cluster. Or, the node groups in a cluster have different cluster security groups associated with them, and traffic cannot flow freely between the node groups.
Verify that the traffic policy is correctly configured:
$ kubectl get svc SERVICE_NAME -n YOUR_NAMESPACE -o=jsonpath='{.spec.externalTrafficPolicy}{"\n"}'
Example output:
Local
Change the setting to Cluster:
$ kubectl edit svc SERVICE_NAME -n YOUR_NAMESPACE
Check the cluster security groups
1. Open the Amazon EC2 console.
2. Select the healthy instance.
3. Choose the Security tab and check the security group ingress rules.
4. Select the unhealthy instance.
5. Choose the Security tab and check the security group ingress rules.
If the security group for each instance is different, then you must modify the security ingress rule in the security group console:
1. From the Security tab, select the security group ID.
2. Choose the Edit inbound rules button to modify ingress rules.
3. Add inbound rules to allow traffic from the other node groups in the cluster.
Verify that your service is configured for targetPort
Your targetPort must match the containerPort in the pod that the service is sending traffic to.
To verify what your targetPort is configured to, run the following command:
$ kubectl get svc SERVICE_NAME -n YOUR_NAMESPACE -o=jsonpath="{.items[*]}{.metadata.name}{'\t'}{.spec.ports[].targetPort}{'\t'}{.spec.ports[].protocol}{'\n'}"
Example output:
alb-instance 8080 TCP
In the preceding example output, the targetPort is configured to 8080. However, because the containerPort is set to 80 you must configure the targetPort to 80.
Verify that your AWS Load Balancer Controller has the correct permissions
The AWS Load Balancer Controller must have the correct permissions to update security groups to allow traffic from the load balancer to instances or pods. If the controller doesn't have the correct permissions, then you receive errors.
Check for errors in the AWS Load Balancer Controller deployment logs:
$ kubectl logs deploy/aws-load-balancer-controller -n kube-system
Check for errors in the individual controller pod logs:
$ kubectl logs CONTROLLER_POD_NAME -n YOUR_NAMESPACE
Note: Replace CONTROLLER_POD_NAME with your controller pod name and YOUR_NAMESPACE with your Kubernetes namespace.
Check the ingress annotations for issues with Application Load Balancers
For issues with the Application Load Balancer, check the Kubernetes ingress annotations:
$ kubectl describe ing INGRESS_NAME -n YOUR_NAMESPACE
Note: Replace INGRESS_NAME with the name of your Kubernetes Ingress and YOUR_NAMESPACE with your Kubernetes namespace.
Example output:
Name: alb-instance-ingress Namespace: default Address: k8s-default-albinsta-fcb010af73-2014729787.ap-southeast-2.elb.amazonaws.com Default backend: alb-instance:80 (192.168.81.137:8080) Rules: Host Path Backends ---- ---- -------- awssite.cyou / alb-instance:80 (192.168.81.137:8080) Annotations: alb.ingress.kubernetes.io/scheme: internet-facing kubernetes.io/ingress.class: alb Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfullyReconciled 25m (x7 over 2d21h) ingress Successfully reconciled
To find ingress annotations that are specific to your use case, see Ingress annotations (from the Kubernetes website).
Check the Kubernetes Service annotations for issues with Network Load Balancers
For issues with the Network Load Balancer, check the Kubernetes Service annotations:
$ kubectl describe svc SERVICE_NAME -n YOUR_NAMESPACE
Example output:
Name: nlb-ip Namespace: default Labels: <none> Annotations: service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing service.beta.kubernetes.io/aws-load-balancer-type: external Selector: app=nlb-ip Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 10.100.161.91 IPs: 10.100.161.91 LoadBalancer Ingress: k8s-default-nlbip-fff2442e46-ae4f8cf4a182dc4d.elb.ap-southeast-2.amazonaws.com Port: http 80/TCP TargetPort: 80/TCP NodePort: http 31806/TCP Endpoints: 192.168.93.144:80 Session Affinity: None External Traffic Policy: Cluster Events: <none>
Note: Take note of the APPLICATION_POD_IP. You'll need it to run a health check command.
To find Kubernetes Service annotations that are specific to your use case, see Service annotations (from the Kubernetes website).
Manually test a health check
Check your application pod IP address:
$ kubectl get pod -n YOUR_NAMESPACE -o wide
Run a test pod to manually test a health check within the cluster for HTTP health checks:
$ kubectl run -n YOUR_NAMESPACE troubleshoot -it --rm --image=amazonlinux -- /bin/bash
For HTTP health checks:
# curl -Iv APPLICATION_POD_IP/HEALTH_CHECK_PATH
Note: Replace APPLICATION_POD_IP with your application pod IP and HEALTH_CHECK_PATH with the ALB Target group health check path.
Example command:
# curl -Iv 192.168.81.137
Example output:
* Trying 192.168.81.137:80... * Connected to 192.168.81.137 (192.168.81.137) port 80 (#0) > HEAD / HTTP/1.1 > Host: 192.168.81.137 > User-Agent: curl/7.78.0 > Accept: */* > * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK HTTP/1.1 200 OK < Server: nginx/1.21.3 Server: nginx/1.21.3 < Date: Tue, 26 Oct 2021 05:10:17 GMT Date: Tue, 26 Oct 2021 05:10:17 GMT < Content-Type: text/html Content-Type: text/html < Content-Length: 615 Content-Length: 615 < Last-Modified: Tue, 07 Sep 2021 15:21:03 GMT Last-Modified: Tue, 07 Sep 2021 15:21:03 GMT < Connection: keep-alive Connection: keep-alive < ETag: "6137835f-267" ETag: "6137835f-267" < Accept-Ranges: bytes Accept-Ranges: bytes < * Connection #0 to host 192.168.81.137 left intact
Check the HTTP response status code. If the response status code is 200 OK, then it means that your application is responding correctly on the health check path.
If the HTTP response status code is 3xx or 4xx, then you can change your health check path. The following annotation can respond with 200 OK:
alb.ingress.kubernetes.io/healthcheck-path: /ping
-or-
You can use the following annotation on the ingress resource to add a successful health check response status code range:
alb.ingress.kubernetes.io/success-codes: 200-399
For TCP health checks, use the following command to install the netcat command:
# yum update -y && yum install -y nc
Test the TCP health checks:
# nc -z -v APPLICATION_POD_IP CONTAINER_PORT_NUMBER
Note: Replace APPLICATION_POD_IP with your application pod IP and CONTAINER_PORT_NUMBER with your container port.
Example command:
# nc -z -v 192.168.81.137 80
Example output:
Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connected to 192.168.81.137:80. Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
Check the networking
For networking issues, verify the following:
- The multiple node groups in the EKS cluster can freely communicate with each other
- The network access control list (network ACL) that's associated with the subnet where your pods are running allows traffic from the load balancer subnet CIDR range
- The network ACL that's associated with your load balancer subnet should allow return traffic on the ephemeral port range from the subnet where the pods are running
- The route table allows local traffic from within the VPC CIDR range
Restart the kube-proxy
If the kube-proxy that runs on each node isn't behaving correctly, then it could fail to update the iptables rules for the service and endpoints. Restart the kube-proxy to force it to recheck and update iptables rules:
kubectl rollout restart daemonset.apps/kube-proxy -n kube-system
Example output:
daemonset.apps/kube-proxy restarted
Related information
How do I troubleshoot service load balancers for Amazon EKS?
Related videos

Relevant content
- asked a month agolg...
- asked a year agolg...
- asked 5 months agolg...
- asked 4 months agolg...
- asked 2 months agolg...
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a month ago