Skip to content

Services in ECS 504 timeouts when making requests to AWS & 3rd Party Services.

0

I have 2 services in ECS, a React frontend service and a Go backend service. Requests are routed successfully between the frontend and backend using the ALB. The EC2 instances are currently running in public subnets.

I am receiving a 504 timeout error when trying to perform operations such as

  • PutItem in DynamoDB
  • Receiving token signing keys in Cognito
  • Redirecting to Stripe's Hosted Checkout UI

This 504 timeout only exists in AWS. I am successfully able to perform all of these tasks locally and in Docker containers. The ALB seems to be routing traffic on port 80 with a listener successfully, the target groups appear healthy, and the security groups are very lenient at the moment.

The fact that I can achieve desired behaviour outside of AWS with minimal permissions & performance overhead likely removes the chance that IAM roles, resource allocations or application level code is the issue. I don't have any WAF ACLs configured at the moment.

I'm confident this shared 504 error with multiple different requests is caused by an AWS misconfiguration perhaps with ECS, EC2, VPC, Security Groups or the ALB. I would be very grateful if you could help me resolve this issue.

1 Answer
0

Based on the information provided, it appears that your ECS services are experiencing network connectivity issues when trying to communicate with AWS services and third-party services. The 504 Gateway Timeout error suggests that the requests are not reaching their intended destinations or are timing out before receiving a response. Given that your EC2 instances are running in public subnets, there are a few potential causes and solutions to consider:

  1. VPC Configuration: Even though your EC2 instances are in public subnets, they may not have proper internet connectivity. Ensure that:

    • The public subnets have a route table entry that points to an Internet Gateway.
    • The EC2 instances have public IP addresses assigned.
  2. Security Group Configuration: While you mentioned that the security groups are lenient, double-check that:

    • The security group associated with your ECS tasks allows outbound traffic to the required services (DynamoDB, Cognito, Stripe).
    • The security group for the ALB allows inbound traffic from your clients and outbound traffic to your ECS tasks.
  3. ECS Task Networking: Verify that your ECS tasks are configured to use the correct network mode (awsvpc for Fargate or EC2 with ENIs, bridge for EC2 without ENIs).

  4. NAT Gateway: If your services need to access the internet but you want to keep them in private subnets, consider setting up a NAT Gateway and adjusting your route tables accordingly.

  5. DNS Resolution: Ensure that DNS resolution is working correctly within your VPC. Check that the DHCP options set for your VPC includes the Amazon-provided DNS server.

  6. Timeouts and Retries: Review and adjust the timeout settings in your application code, especially for external service calls. Implement proper retry mechanisms with exponential backoff for transient failures.

  7. CloudWatch Logs: Enable detailed CloudWatch logs for your ECS tasks and examine them for any network-related errors or timeouts.

  8. VPC Flow Logs: Enable VPC Flow Logs to analyze network traffic patterns and identify any potential connectivity issues.

  9. Load Balancer Timeout: Check the idle timeout setting on your Application Load Balancer. The default is 60 seconds, which might be too short for some long-running operations.

To resolve this issue:

  1. Review and correct your VPC networking configuration, ensuring proper routes to the internet gateway.
  2. Verify security group rules to allow necessary outbound traffic.
  3. Consider using a NAT Gateway if you want to move your instances to private subnets.
  4. Implement and tune appropriate timeout and retry logic in your application code.
  5. Use CloudWatch Logs and VPC Flow Logs to diagnose specific connectivity issues.

By systematically checking and adjusting these configurations, you should be able to resolve the 504 timeout errors and establish proper connectivity between your ECS services and the required AWS and third-party services.
Sources
HTTP 504 status code (Gateway Timeout) - Amazon CloudFront
REL05-BP05 Set client timeouts - AWS Well-Architected Framework
Troubleshoot ETIMEDOUT error in my Lambda function | AWS re:Post

answered a year ago
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.