Intermittent 503 on API Gateway

0

Two days ago, I started getting 503 "Service Unavailable" responses from my API Gatewate/Lambda calls. This is the first time I've experienced this. It happens about 30-40% of the time right now. It is NOT just one endpoint. It happens on all of them randomly. Even my simple, next-to-zero resources "hello" endpoint gets the intermittent 503 "Service Unavailable". So it is NOT that I am using too many resources. I am the only one using the API right now and I'm testing it once every 3 seconds. So there is NO WAY I am overloading it beyond its capacity.

I did NOT change any code before this started happening. It just started happening two days ago and has continued until now. According to the AWS Health Dashboard, they are reporting no problems. But I AM having a problem and it is critically affecting my ability to do my job.

I check the logs, but they don't tell me anything I don't already know.

We are paying for "Developer Support" and submitted a ticket 24 hours ago, but I've heard nothing back. The SLA is supposed to be 12 hours.

WHAT IS GOING ON?!?

2 Answers
0
Accepted Answer

I finally figured out what was happening!!!

I inadvertently created an infinite loop which was taking up so many resources that it interfered with the API Gateway's ability to process all requests.

How did I create an infinite loop? I had a lambda that was triggered by an ObjectCreated event in an S3 bucket, the big problem was that the lambda wrote a new file to that same bucket which then started the process over again. Infinitely. (I fixed it with one line of code. :))

So if anyone else is having a problem with intermittent 503's on the API Gateway, maybe check if any of your backend processes might be creating an infinite loop and thus taking away processing power from the API Gateway. I'm a little embarrassed, of course, but if it helps someone else in the future, it will be worth it.

answered 6 months ago
profile picture
EXPERT
reviewed a month ago
0

Are there logs in Cloudtrail? Did you enable execution and access logs for API GW/stage?

profile picture
EXPERT
answered 6 months ago
profile picture
EXPERT
reviewed a month ago
  • Thank you for responding.

    I ended up reverting back 3 versions of my backend code and then redeploying it one at a time looking for any indications of the problem. I got the code all back to the present version and I do not see the problem anymore. I will keep monitoring it. But it looks fixed...

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions