Catching throttling exceptions in a Step Function

0

I have an S3 Batch operation that calls a Lambda. That Lambda initiates a Step Function (A), which in turn calls another Step Function (B) and waits for its response before moving on.

The Lambda is handling rate limiting well – if it gets throttled, it returns a TemporaryFailure status and S3 Batch deals with it just fine. I've had no throttling-related failures on the batch/lambda side because of this.

Unfortunately Step Function A fails a good percentage of the time. The reason given is StepFunctions.AWSStepFunctionsException, with the message:

Rate exceeded (Service: AWSStepFunctions; Status Code: 400; Error Code: ThrottlingException; Request ID: [redacted]; Proxy: null)

To be clear, this isn't one step function execution trying to start several others – it's 15,000 separate executions of Step Function A (which all start normally) trying to start 15,000 executions of Step Function B. It's those StartExecution states that are failing.

I suppose I could add a Retrier to my StartExecution state in Step Function A that would catch StepFunctions.AWSStepFunctionsException and retry a bunch of times with a reasonable backoff. But I don't necessarily want to catch all StepFunctions.AWSStepFunctionsExceptions; just the ones that get throttled. I can't figure out how to do that.

mbklein
已提問 1 年前檢視次數 986 次
1 個回答
0

Would increasing this quota solve most of this throttling issue? Quotas related to API action throttling

profile pictureAWS
專家
kentrad
已回答 1 年前
  • I looked into a quota increase, but that seems a bit like a cop-out on our end, and also potentially unreliable. My experience has been that rate limiting and error handling are far more scalable than adding capacity.

    Rather than increasing the quota and hoping we don't exceed the new one, I'd much rather do some rate limiting and/or throttling retrying in the Step Function itself. Some (not all) of this can be achieved with batching when the Step Function is mapping over a list of inputs, but in this case, it's thousands of simultaneous executions processing one input each.

  • I was thinking that a quota increase would make the throttling exception an exception. It would still need to be handled but not something that was normal. Sorry, I didn't really answer your original question.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南