Skip to content

ApiGateway returning 503 seemingly without hitting the target backend

0

I am using ApiGateway that points to a Nextjs container running in ECS. ApiGateway returns 503 "service unavailable" while the back-end is running with enough resources and no logs while reporting a 0ms latency and missing integrationStatus. This only happens very sporadically and for a very short period of time, it usually only lasts a second or two. Refreshing the page after those two seconds resolves the issue until it happens again.

The facts:

  • CPU max in the last week is 40% utilization, memory usage tops at 21%
  • ECS events: "service NextjsService has reached a steady state." every 6 hours for the last week. No interruption
  • ECS Tasks: started 7 days ago. No interruptions.
  • The service logs are completely empty, no success or failure. When ApiGateway does not return 503 instantly the service logs indicate success or failure depending on the result. This is the firsts clue that the service is not hit at all.
  • The 503 error disappears after refreshing the page a couple of times.

This is what a healthy request looks like:

{
    "requestTime": "\"18/Feb/2025:15:29:34 +0000\"",
    "requestId": "\"GL8W0jwYiGYEMnQ=\"",
    "httpMethod": "\"GET\"",
    "path": "\"/_next/static/chunks/5136-9d8d2341ac9832fd.js\"",
    "routeKey": "\"$default\"",
    "status": "200",
    "responseLatency": "140",
    "integrationRequestId": "\"-\"",
    "integrationStatus": "\"200\"",
    "integrationLatency": "\"133\"",
    "integrationServiceStatus": "\"200\"",
    "ip": "<redacted>"",
    "userAgent": "<redacted>"
}

This is a "service unavailable" 503. Notice that integrationStatus is absent and the integrationLatency is 0

{
    "requestTime": "\"18/Feb/2025:15:29:40 +0000\"",
    "requestId": "\"GL8XzgX8CGYEMRw=\"",
    "httpMethod": "\"GET\"",
    "path": "\"/sign-in\"",
    "routeKey": "\"$default\"",
    "status": "503",
    "responseLatency": "6",
    "integrationRequestId": "\"-\"",
    "integrationStatus": "\"-\"",
    "integrationLatency": "\"0\"",
    "integrationServiceStatus": "\"200\"",
    "ip": "<redacted>",
    "userAgent": "<redacted>"
}

So my questions are:

  • What could be the reason for the integrationStatus field missing?
  • Is 0 ms latency really enough time to label a service as unavailable if the ECS container is running?
  • Where does the 503 status come from if integrationServiceStatus returns 200?
1 Answer
0

Based on the information provided, it appears that the 503 "Service Unavailable" error you're experiencing is likely occurring at the API Gateway level before the request reaches your backend ECS container. Here's an analysis of the situation:

  1. Missing integrationStatus and 0ms latency: The absence of the integrationStatus field and the 0ms integrationLatency suggest that API Gateway is not successfully initiating the integration request to your backend. This could indicate that API Gateway is having issues establishing a connection with your ECS service.

  2. 503 status vs. integrationServiceStatus 200: The 503 status is being returned by API Gateway itself, not your backend service. The integrationServiceStatus of 200 in this context is likely a default value and doesn't represent an actual response from your ECS container, since the integration doesn't seem to have been initiated.

  3. Sporadic nature and quick resolution: The fact that the issue resolves itself quickly and occurs sporadically points to a temporary problem, possibly related to API Gateway's internal processes or networking issues.

Possible reasons for this behavior:

  1. Temporary network issues between API Gateway and your ECS service.
  2. API Gateway might be experiencing brief internal issues or resource constraints.
  3. There could be a configuration problem in your API Gateway setup, such as incorrect endpoint URLs or integration settings.
  4. If you're using VPC Link, there might be intermittent issues with the VPC Link connection.

To troubleshoot this issue:

  1. Review your API Gateway configuration, especially the integration settings and endpoint URLs.
  2. Check if you're hitting any API Gateway limits or throttles.
  3. Investigate if there are any VPC or networking issues if you're using VPC Link.
  4. Monitor AWS service health dashboards for any reported issues in your region.
  5. Consider implementing retry logic in your client applications to handle these sporadic 503 errors.
  6. If the issue persists, you may need to contact AWS Support for a more in-depth investigation, as they can access additional logs and provide more specific guidance based on your account and API configuration.

Remember that API Gateway often provides generic error messages to clients for security reasons, so more detailed error information is typically available in the backend logs, which you can access as the API owner.
Sources
API Gateway returns 503 after one second - no Lambda request id | AWS re:Post
Intermittent 503 on API Gateway | AWS re:Post
HTTP 503 status code (Service Unavailable) - Amazon CloudFront
Why does the Error occurred | AWS re:Post

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.