How do I troubleshoot Network Load Balancer health check failures for Amazon ECS tasks on Fargate?

7 minute read
0

I want to troubleshoot Network Load Balancer health check failures that I receive when running Amazon Elastic Container Service (Amazon ECS) tasks on AWS Fargate.

Short description

When health checks are configured, your Network Load Balancer periodically sends health check requests to each registered target. For TCP health checks, A health check simply attempts to open a TCP connection on the specified port. Failure to open a connection on the specified port within the configured timeout is considered unhealthy. For a UDP service, target availability is tested using non-UDP health checks on your target group. For HTTP and HTTPS health checks, see How do I troubleshoot Application Load Balancer health check failures for Amazon ECS tasks on Fargate?

When your Amazon ECS tasks fails a Network Load Balancer health check, the following errors from your Amazon ECS service event message can appear:

  • Health checks failed error - (service AWS-service) (port 80) is unhealthy in (target-group arn:aws:elasticloadbalancing:us-east-1:111111111111:targetgroup/aws-targetgroup/123456789) due to (reason Health checks failed)
  • Target is in an Availability Zone that is not turned on for the load balancer error - (service AWS-service) (port 80) is unhealthy in (target-group arn:aws:elasticloadbalancing:us-east-1:111111111111:targetgroup/aws-targetgroup/123456789) due to (reason Target is in an Availability Zone that is not enabled for the load balancer)

Example error message from your Amazon ECS task console:

Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:us-east-1:111111111111:targetgroup/aws-targetgroup/123456789)

If you receive container health check failures, then see How do I troubleshoot the container health check failures for Amazon ECS tasks?

If your Amazon ECS tasks have stopped, then see Checking stopped tasks for errors.

Note: Amazon ECS tasks can return an unhealthy status for multiple reasons. If the following resolution doesn't resolve your errors, then see Troubleshooting service load balancers.

Resolution

Important: Review all AWS Command Line Interface (AWS CLI) commands and replace all instances of example strings with your specific values. For example, replace example-task-private-ip with your specific task's private IP address.

Note: If you receive errors when running AWS CLI commands, then confirm that you're running a recent version of the AWS CLI.

Health checks failed

To troubleshoot your load balancer health check failures on your Amazon ECS Fargate tasks, complete the following steps:

  • Check the connectivity between your load balancer and Amazon ECS task
  • Confirm that your tasks respond correctly to manual checks within your Amazon Virtual Private Cloud (Amazon VPC)
  • Check the status and configuration of the application in your Amazon ECS container

Check the connectivity between your load balancer and Amazon ECS task

Make sure that your load balancer is allowed to perform health checks on your Amazon ECS tasks:

  • If your container is mapped to port 80, then confirm that your container security group allows inbound traffic on port 80.
  • Make sure that the Amazon ECS Fargate ENI security group allows traffic on the Amazon VPC CIDR range. This allows the Network Load Balancer nodes to reach the Amazon ECS tasks to perform health checks. For more information, see Target security groups.
  • Confirm that the network access control lists (ACL) associated with the subnets of the elastic network interface for your Fargate task allow ingress traffic. Ingress traffic must be allowed on the health check port. Also, confirm that the network ACL allows egress traffic on the ephemeral ports.

Confirm that your tasks respond correctly to manual checks within your Amazon VPC

Confirm that your Amazon Elastic Compute Cloud (Amazon EC2) instance tasks within your Amazon VPC respond correctly to manual checks:

Note: You can either create a cluster for the Amazon EC2 launch type or launch the new Amazon EC2 instance. If you don't want to launch an Amazon EC2 instance, you can use the ECS exec feature. To do this, launch a standalone task in the same VPC with --enable-execute-command.

(Option 1) For HTTP health checks:

$ curl -Iv http://<example-task-private-ip>:<example-port>/<healthcheck_path>

Example output:

HTTP/1.1 200 OK

Note: You can receive successful status codes in the range of 200-399for configurations set for HTTP health checks on the target group.

(Option 2) For TCP health checks that don't use SSL with the targets:

$ nc -z -v -w10 example-task-private-ip example-port

Example output:

nc -z -v -w10 10.x.x.x 80
Connection to 10.x.x.x port 80 [tcp/http] succeeded!

(Option 3) For TCP health checks that require SSL for backend health checks:

$ nc -z -v -w10 --ssl example-task-private-ip example-port

Example output:

nc -z -v -w10 10.x.x.x 443
Connection to 10.x.x.x port 443 [tcp/https] succeeded!

Check the status and configuration of the application in your ECS container

  • Check that the ping port and the health check path for your target group are configured correctly.
  • Monitor the CPU and memory utilization metrics for your Amazon ECS service.
  • If your Amazon ECS task requires a longer health check grace period for registering to the Network Load Balancer, then increase the healthCheckGracePeriodSeconds.  To update the health check grace period, run the following command:
$ aws ecs update-service --cluster example-cluster --service example-service --region <example-region> --health-check-grace-period-seconds <example-value-in-seconds>
  • Check your application logs for application errors. For more information, see Viewing awslogs container logs in CloudWatch Logs.
  • Confirm the response code sent by your application on the HealthCheckPath. If your application has access logging configured, then check the response logged using the ELB-HealthChecker/2.0 keyword. If you're using CloudWatch Logs, then use Log Insights and run the following query:
fields @timestamp, @message
| sort @timestamp desc
| filter @message like /ELB-HealthChecker/

Target is in an Availability Zone that is not turned on for the load balancer

When you turn on an Availability Zone (AZ) for your load balancer, load balancing creates a load balancer node in the AZ. If you register targets in an AZ, then you need to turn on the AZ so that the registered targets receive traffic. For more information, see Availability Zones and load balancer nodes.

To identify the Availability Zones that your load balancer is configured for, run the following command:

$ aws elbv2 describe-load-balancers --load-balancer-arn <example-arn-load-balancer> --region <example-region> --query "LoadBalancers[].AvailabilityZones[].ZoneName"

Note: You can't turn off Availability Zones for a Network Load Balancer after you create it, but you can turn on additional Availability Zones.

To identify the Availability Zones that your Amazon ECS Fargate task is configured for, run the following command:

Note: The following command returns subnets that your service is configured for.

$ aws ecs describe-services —cluster <example-cluster-name> —services <example-service-name> --region <example-region> --query "services[].networkConfiguration.awsvpcConfiguration.subnets"

To identify the Availability Zones of the preceding subnets, use the preceding subnet IDs in the following command:

Note: The following command returns Availability Zones that your service is configured for.

$ aws ec2 describe-subnets --subnet-ids <example-subnet-ids> --region <example-region> --query "Subnets[].AvailabilityZone"

Note: You can change the subnet configuration of an Amazon ECS service using the AWS CLI update-service command

Related information

Troubleshooting service load balancers

Health checks for your target groups

AWS OFFICIAL
AWS OFFICIALUpdated a year ago