Catching throttling exceptions in a Step Function

0

I have an S3 Batch operation that calls a Lambda. That Lambda initiates a Step Function (A), which in turn calls another Step Function (B) and waits for its response before moving on.

The Lambda is handling rate limiting well – if it gets throttled, it returns a TemporaryFailure status and S3 Batch deals with it just fine. I've had no throttling-related failures on the batch/lambda side because of this.

Unfortunately Step Function A fails a good percentage of the time. The reason given is StepFunctions.AWSStepFunctionsException, with the message:

Rate exceeded (Service: AWSStepFunctions; Status Code: 400; Error Code: ThrottlingException; Request ID: [redacted]; Proxy: null)

To be clear, this isn't one step function execution trying to start several others – it's 15,000 separate executions of Step Function A (which all start normally) trying to start 15,000 executions of Step Function B. It's those StartExecution states that are failing.

I suppose I could add a Retrier to my StartExecution state in Step Function A that would catch StepFunctions.AWSStepFunctionsException and retry a bunch of times with a reasonable backoff. But I don't necessarily want to catch all StepFunctions.AWSStepFunctionsExceptions; just the ones that get throttled. I can't figure out how to do that.

mbklein
질문됨 일 년 전983회 조회
1개 답변
0

Would increasing this quota solve most of this throttling issue? Quotas related to API action throttling

profile pictureAWS
전문가
kentrad
답변함 일 년 전
  • I looked into a quota increase, but that seems a bit like a cop-out on our end, and also potentially unreliable. My experience has been that rate limiting and error handling are far more scalable than adding capacity.

    Rather than increasing the quota and hoping we don't exceed the new one, I'd much rather do some rate limiting and/or throttling retrying in the Step Function itself. Some (not all) of this can be achieved with batching when the Step Function is mapping over a list of inputs, but in this case, it's thousands of simultaneous executions processing one input each.

  • I was thinking that a quota increase would make the throttling exception an exception. It would still need to be handled but not something that was normal. Sorry, I didn't really answer your original question.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠