How do I troubleshoot Application Load Balancer health check failures for Amazon ECS tasks on Fargate?

6 minute read
1

I want to resolve Application Load Balancer health check failures when running Amazon Elastic Container Service (Amazon ECS) tasks on AWS Fargate.

Short description

When Amazon ECS tasks fail Application Load Balancer health checks, you might receive one of the following errors from your Amazon ECS service event message:

  • Request timed out
  • Health checks failed with no error codes
  • Health checks failed with 404 or 5xx error codes
  • Target is in an Availability Zone that is not turned on for the load balancer

For failed container health checks, see How do I troubleshoot the container health check failures for Amazon ECS tasks?

If you're using Amazon ECS with Amazon Elastic Compute Cloud (Amazon EC2) container instances, then see the following documentation:

Resolution

Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, confirm that you're running a recent version of the AWS CLI. In the following AWS CLI commands, replace the example values with your values.

Request timed out error

Check the security groups to make sure that the load balancer can make health check requests to the Fargate task. The Fargate task security group must allow inbound and outbound traffic on the container port that's specified in the task definition. The source must be the Application Load Balancer security group. The Application Load Balancer security group must allow outbound traffic to the Fargate task security group.

Note: It's a best practice to configure different security groups for your Fargate task and load balancer to allow traffic between them.

If the security groups allow communication between your Fargate task and Application Load Balancer, then check your HealthCheckTimeoutSeconds in your health check settings. Slightly increase the timeout seconds, if necessary.

Note: Increase HealthCheckTimeoutSeconds only if your application takes a long time to respond to a health check.

To check the average response time, run the following command:

$ time curl -Iv http://<example-task-pvt-ip>:<example-port>/<example_healthcheck_path>

Note: High resource utilization on tasks might cause slowness or a hung process and results in a health check failure.

Health checks failed with no error codes

Example health check failed error message:

(service AWS-service) (port 80) is unhealthy in (target-group arn:aws:elasticloadbalancing:us-east-1:111111111111:targetgroup/aws-targetgroup/123456789) due to (reason Health checks failed)

If you receive a similar error message, then check that the task quickly responds after it starts in Amazon ECS. Also, check that the application replies with the correct response code.

Make sure that the task has time to respond after it starts in Amazon ECS

To make sure that the task has sufficient time to respond after starting, increase the healthCheckGracePeriodSeconds. This allows Amazon ECS to retain the task for a longer time period, and ignore unhealthy Elastic Load Balancing target health checks.

Note: If you're creating a new service, then you can configure the health check grace period on the load balancer configuration page.

To update the healthCheckGracePeriodSeconds for your existing Amazon ECS service, run the following command:

$ aws ecs update-service --cluster <EXAMPLE-CLUSTER-NAME> --service <EXAMPLE-SERVICE-NAME> --region <EXAMPLE-REGION> --health-check-grace-period-seconds <example-value-in-seconds>

Check that the application replies with the correct response code

To confirm the response code that your application sent on the health check path, use the following methods.

If you configured access logging on your application, then use ELB-HealthChecker/2.0 to check the response. If you're using AWS CloudWatch Logs, then use CloudWatch Logs Insights and run the following command:

fields @timestamp, @message
  | sort @timestamp desc
  | filter @message like /ELB-HealthChecker/

For Amazon EC2 instances in the same Amazon Virtual Private Cloud (Amazon VPC), run the following commands to confirm that your tasks respond to manual checks. To launch a new Amazon EC2 instance, see Tutorial: Get started with Amazon EC2 Linux instances.

HTTP health checks

$ curl -Iv http://<example-task-pvt-ip>:<example-port>/<example_healthcheck_path>

HTTPS health checks

$ curl -Iv https://<example-task-pvt-ip>:<example-port>/<example_healthcheck_path>

If tasks quickly stop and you can't get the private IP addresses, launch a standalone task outside Amazon ECS to troubleshoot the issue. Use the same task definition and run a curl command to its IP address to launch the task. The task doesn't stop because of a health check failure.

Also, use Amazon ECS Exec to check listening ports on the container level. Using netstat, confirm that the application is listening on the appropriate port:

$ netstat -tulpn | grep LISTEN

Health checks failed with 404 or 5xx error codes

Receiving health check failures with 404 or 5xx error codes indicate that the health check request was acknowledged, but received an invalid response code. The codes also indicate that the response code that the application sent doesn't match the success code that's configured on the target group level (parameter: Matcher).

A 404 error code can occur when a health check path doesn't exist, or there's a typo in the configuration of the health check path. A 5xx error code can occur when the application that's inside the task isn't correctly replying to the request, or there's a processing error.

To determine whether your application is starting successfully, check your application logs.

Target is in an Availability Zone that is not turned on for the load balancer

When an Availability Zone is turned on for your load balancer, elastic load balancing creates a load balancer node in the Availability Zone. If you register targets in an Availability Zone and don't turn on the Availability Zone, then the registered targets don't receive traffic. For more information, see Availability Zones and load balancer nodes.

To identify the Availability Zones that your load balancer is configured for, run the following command:

aws elbv2 describe-load-balancers --load-balancer-arns <EXAMPLE-ALB-ARN> --query 'LoadBalancers[*].AvailabilityZones[].{Subnet:SubnetId}'

To identify the Availability Zones that your Fargate task is configured for, run the following command:

aws ecs describe-services --cluster <EXAMPLE-CLUSTER-NAME> --service <EXAMPLE-SERVICE-NAME> --query 'services[*].deployments[].networkConfiguration[].awsvpcConfiguration.{Subnets:subnets}'

Note: Use the update-service AWS CLI command to change the subnet configuration of an Amazon ECS service. Use the enable-availability-zones-for-load-balancer AWS CLI command to add an Availability Zone to an existing Application Load Balancer.

Related information

Troubleshooting service load balancers

Health checks for your target groups

AWS OFFICIAL
AWS OFFICIALUpdated a year ago