API Gateway Returning Server 500 Error Intermittently

0

Hi, I am struggling to debug an intermittent issue on my Web App (Amplify). I have a web app that displays images. When opening the webapp, a get request is made for each image on the page.

Essentially this is split into two API calls, first getImageLocation: API Gateway -> Lambda -> DynamoDB (which returns the image location path in S3) Then, getImage: Fetch straight from S3 Then, Display Image on Web App (Amplify).

Now the problem I'm faced with, is half the time it works fully, half the time a bunch of the images fail to load - it seems random. Sometimes 1 image fails to load, sometimes 5 fail to load - if you refresh enough times you manage to get all images to load. Here is an example below: Failed to Load

Now, I refresh the page and then they all load!: Load Successfully

If I look in the console errors it says it's a CORS issue - but I've configured that correctly as it works most of the time, so CORS must be working (with wild card '*' and the associated Headers) . The only thing I can assume is possibly some timing issue, something is running too fast or too slow? CORS Issue?

UPDATE

Just adding, I have configured CloudWatch Logs on error events - where i can catch the error, but the "error": "$context.integrationErrorMessage", "message": "$context.error.message"} provides no useful insight as shown below:

{
    "requestId": "976f543a-e191-421f-9a2e-0ceb9f45a418",
    "extendedRequestId": "av_O-EniywMFVvw=",
    "ip": "121.223.130.209",
    "caller": "-",
    "user": "-",
    "requestTime": "11/Jul/2024:13:33:32 +0000",
    "httpMethod": "GET",
    "resourcePath": "/image/latest",
    "status": "500",
    "protocol": "HTTP/1.1",
    "responseLength": "36",
    "error": "-",
    "message": "Internal server error"
}
2 Answers
0
Accepted Answer

I have resolved my issue. It was to do with the amount of concurrent calls being made to my Lambda function. Reading online I saw that the AWS default for Lambda concurrent executions was 1,000 - so it didn't cross my mind to actually check that.

However, after digging and digging I found that my concurrent executions was in fact 10 Concurrent Executions

Essentially meaning that if I ran the get API and more than 10 get requests occurred concurrently it would result in a rate limit, which I could see in CloudWatch Logs:

(daXXXd9d-92d92-4dwdc-bwad1-7ewaf9dwafc) Lambda invocation failed with status: 429. Lambda request id: c525XXXb8-9a54-4f2b-XXb0-7efXX7XX316a
(daXXXd9d-92d92-4dwdc-bwad1-7ewaf9dwafc) Execution failed due to configuration error: Rate Exceeded.

To resolve this I had to use **AWS Service Quota **and request an increase from 10 concurrent executions to something larger, in my case 50 which took a few hours to get approved.

As my WebApp works by sending a GET request for every image on the page the more images I added resulted in seeing the error more frequently which now makes sense. This has suggested my architecture is flawed and I should bundle my multiple GET requests into one single request to prevent this issue from occurring.

Since this change I have had no issues.

vleppi
answered 3 months ago
0

Hi,

Based on documentation: https://docs.aws.amazon.com/apigateway/latest/developerguide/supported-gateway-response-types.html

API_CONFIGURATION_ERROR	500	
The gateway response for an invalid API configuration—including when an invalid endpoint 
address is submitted, when base64 decoding fails on binary data when binary support is enacted, 
or when integration response mapping can't match any template and no default template is 
configured. If the response type is unspecified, this response defaults to the DEFAULT_5XX type.

Are you in one of those situations when it fails?

Best,

Didier

profile pictureAWS
EXPERT
answered 3 months ago
  • I doubt it's the endpoint because its the same one every time (when it works vs when it doesn't). Can't be the base64 decoding because the error seems to be from the first get request which is only going to DynamoDB (not actually fetching the image) And again, I don't get how it could be the response mapping because its configured a way that works.

    None of these error states really give me insight into why it would be intermittent...

    Maybe it might spit out the error on the screen that makes it look like its the first get request issue, but could be fetching the binary data from S3?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions