Catching throttling exceptions in a Step Function

0

I have an S3 Batch operation that calls a Lambda. That Lambda initiates a Step Function (A), which in turn calls another Step Function (B) and waits for its response before moving on.

The Lambda is handling rate limiting well – if it gets throttled, it returns a TemporaryFailure status and S3 Batch deals with it just fine. I've had no throttling-related failures on the batch/lambda side because of this.

Unfortunately Step Function A fails a good percentage of the time. The reason given is StepFunctions.AWSStepFunctionsException, with the message:

Rate exceeded (Service: AWSStepFunctions; Status Code: 400; Error Code: ThrottlingException; Request ID: [redacted]; Proxy: null)

To be clear, this isn't one step function execution trying to start several others – it's 15,000 separate executions of Step Function A (which all start normally) trying to start 15,000 executions of Step Function B. It's those StartExecution states that are failing.

I suppose I could add a Retrier to my StartExecution state in Step Function A that would catch StepFunctions.AWSStepFunctionsException and retry a bunch of times with a reasonable backoff. But I don't necessarily want to catch all StepFunctions.AWSStepFunctionsExceptions; just the ones that get throttled. I can't figure out how to do that.

mbklein
質問済み 1年前987ビュー
1回答
0

Would increasing this quota solve most of this throttling issue? Quotas related to API action throttling

profile pictureAWS
エキスパート
kentrad
回答済み 1年前
  • I looked into a quota increase, but that seems a bit like a cop-out on our end, and also potentially unreliable. My experience has been that rate limiting and error handling are far more scalable than adding capacity.

    Rather than increasing the quota and hoping we don't exceed the new one, I'd much rather do some rate limiting and/or throttling retrying in the Step Function itself. Some (not all) of this can be achieved with batching when the Step Function is mapping over a list of inputs, but in this case, it's thousands of simultaneous executions processing one input each.

  • I was thinking that a quota increase would make the throttling exception an exception. It would still need to be handled but not something that was normal. Sorry, I didn't really answer your original question.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ