How do I troubleshoot 504 errors that I receive when I use an Application Load Balancer?

5 minute read
5

I want to troubleshoot the ELB_504 error that I receive in my Application Load Balancer access logs or Amazon CloudWatch metrics. Or, I receive an HTTP 504 error when I connect to my service through an Application Load Balancer.

Short description

An HTTP 504 error occurs when a gateway or proxy times out. Application Load Balancer HTTP 504 errors can occur for the following reasons:

  • The load balancer failed to establish a connection to the target before the 10-second connection timeout expired.
  • The load balancer encountered an SSL handshake timeout when it connected to a target.
    Note: You can't adjust the 10-second SSL handshake timeout.
  • The load balancer established a connection to the target, but the target didn't respond before the idle timeout period passed.
  • The target returned a Content-Length header value that's larger than the entity body, and the load balancer timed out.
  • High traffic caused the targets to respond more slowly.
  • The target is an AWS Lambda function and the service didn't respond before the connection timeout expired.

Resolution

Check that your load balancer allows traffic with registered targets

Check the TargetConnectionErrorCount CloudWatch metric with the Sum statistic. If you see positive data points instead of 0, then there are connection issues between the load balancer and the target.

To resolve these issues, check the network security groups that are associated with the load balancer and the backend targets. Make sure that the network security groups allow traffic between the load balancer and targets in both directions on the traffic and health check ports. Confirm that the subnet's network access control list (network ACL) allows traffic from the targets to the load balancer nodes on the ephemeral ports (1024-65535).

Note: It's a best practice to use specific security group rules for your Application Load Balancer.

Check your load balancer metrics

To determine why your targets are marked as Unhealthy, check the CloudWatch metrics for your Application Load Balancer. If there isn't HTTPCode_ELB_504_Count metric data, then your application servers returned the 504 errors, not the load balancer. Check whether the maximum value for the TargetResponseTime metric frequently exceeds the timeout value because this configuration can cause 504 errors.

Also, check the following CPU and memory utilization metrics in your targets based on the resource type:

  • For Amazon Elastic Compute Cloud (Amazon EC2), check the CPUUtilization metric. EC2 instances don't send memory metrics to CloudWatch by default, but you can send a custom memory metric.
  • For Amazon ECS tasks, check the CPUUtilization and MemoryUtilization metrics. If the value for either is 1 (100%), then the task becomes unresponsive.
  • For Lambda functions, check the Duration metric. If the Duration lasts longer than the load balancer's idle timeout value, then you receive a Gateway timeout error.

Increase your resource availability

If your targets have high CPU usage, then they might become unresponsive.

To resolve this issue, increase the following resources to your targets:

Update your application's code to be more efficient when it responds to HTTP requests. Make sure that the application doesn't take more time to respond than the configured idle timeout period. By default, the idle timeout for an Application Load Balancer is 60 seconds. If needed, then increase the idle timeout of your load balancer.

Note: It's a best practice to increase the idle timeout value only when the target has a large number of compute operations to complete. Otherwise, it's a best practice to optimize your resource usage in the targets instead.

Scale your targets based on your demand

To scale the targets based on your demand, take the following actions for your configuration:

Note: When you invoke a Lambda function, the function automatically scales.

Check your external dependencies

When an application uses a microservices architecture, external dependencies, such as databases or APIs, affect the target response time.

Check the following common external dependencies for issues:

  • For Amazon Relational Database Service (Amazon RDS) databases, check the ReadLatency, WriteLatency, and DatabaseConnections CloudWatch metrics.
  • For Amazon Simple Queue Service (Amazon SQS) queues, check the ApproximateAgeOfOldestMessage and NumberOfMessagesDelayed CloudWatch metrics.
  • For Amazon Simple Storage Service (Amazon S3) buckets, check the FirstByteLatency, TotalRequestLatency, 4xxErrors, and 5xxErrors CloudWatch metrics.
  • For Amazon Cognito authentication services, check the TokenRefreshSuccesses CloudWatch metric, and check for ThrottlingException errors.

To determine the source of a performance bottleneck, take the following actions:

Use Compute Optimizer to avoid future issues

Use AWS Compute Optimizer to get insights into your usage of Amazon EC2, Amazon ECS, and Lambda resources. The Compute Optimizer detects when your resource usage is too high to help you avoid timeouts. It also provides best practices about how to optimize your workloads.

Related information

HTTP 504: Gateway timeout

How do I troubleshoot high latency on my Application Load Balancer in Elastic Load Balancing?

How do I troubleshoot authentication issues in my Application Load Balancer?

How do I troubleshoot 504 errors returned when I use a Classic Load Balancer?