How precise is API Gateway throttling?


Our team has started to work with stage-based API Gateway throttling and noticed some irregular behavior. To prevent our system from being overrun with too many API requests, we've settled on a throttling limit of 20 requests per second. After enabling this limit we wanted to ensure that the throttling was actually working, so I build a simple JMeter suite to bombard the Invoke URL with requests:

  1. Start 100 test threads
  2. Synchronize threads so that all 100 are running before a request is made
  3. All 100 make a simple POST to the gateway.
  4. Check for 200 vs 429 Responses on each thread

Based on the documentation for the gateway, my expectation was that I would see 20 requests get "200 OK" Responses, then see 80 requests with 429 returned. Instead, I get an unpredictable amount of blocked requests that are vastly outside the set limit. Usually somewhere around 80 requests get through, and only 20 get blocked, but sometimes all 100 requests get through. The bucket-token model the gateway operates under also means that I shouldn't have to worry about hitting any 1-second timing window either, where I might possibly see 40 requests sneak through as a timer ticks over to a new second interval.

Is our throttling limit just too tiny for the gateway to notice a difference? With default values in the few thousand, I can see how 20 might be too many orders of magnitude off. Or, are we simply misinterpreting how the gateway handles throttling?

TL;DR: Using a throttling limit of 20 on an API Gateway stage is not limiting API requests as precisely as we had expected.

  • What happens if you test with thousands of threads?

gefragt vor 3 Jahren3388 Aufrufe
1 Antwort
Akzeptierte Antwort

As is mentioned in the last message on this stackoverflow thread, it's likely due to AWS API Gateway being Well Architected and that it does not have a single back end endpoint servicing your requests. As the author of this post mentions, when a delay is added to the request the throttle is applied more consistently.

The problems you highlight also exist for any service using a distributed Well Architected setup for throttling. Unless all the requests for the given throttling "key" land on the same host in the same moment, the system would not be able to provide perfectly consistent throttling.

The scenario you've posited, assuming the requests are distributed across multiple backend hosts, will result in the these theoretical buckets going negative after the hosts in the throttle group distribute their known information, which likely takes a few seconds. The bucket would then be replenished with the rate as requests are handled or rejected, but while the bucket is negative, all requests for that key will result in a 429 response.

profile pictureAWS
beantwortet vor 3 Jahren
profile picture
überprüft vor 2 Monaten
  • Thanks, the Well Architected paradigm is something I've got to bake into my mental model of most AWS products.

    I'm curious to see how accurate the burst limiting is, given these facts. If it's almost guaranteed that the bucket will "go negative", I'll try and see how negative it goes at different magnitudes:

    LimitBurstTest Requests/s

    We had to push this work out to a future sprint, given the discovery of the throttling behavior, so I won't be able to fully experiment for a week or so. But I promise to come back with some test results!

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen