Catching throttling exceptions in a Step Function

0

I have an S3 Batch operation that calls a Lambda. That Lambda initiates a Step Function (A), which in turn calls another Step Function (B) and waits for its response before moving on.

The Lambda is handling rate limiting well – if it gets throttled, it returns a TemporaryFailure status and S3 Batch deals with it just fine. I've had no throttling-related failures on the batch/lambda side because of this.

Unfortunately Step Function A fails a good percentage of the time. The reason given is StepFunctions.AWSStepFunctionsException, with the message:

Rate exceeded (Service: AWSStepFunctions; Status Code: 400; Error Code: ThrottlingException; Request ID: [redacted]; Proxy: null)

To be clear, this isn't one step function execution trying to start several others – it's 15,000 separate executions of Step Function A (which all start normally) trying to start 15,000 executions of Step Function B. It's those StartExecution states that are failing.

I suppose I could add a Retrier to my StartExecution state in Step Function A that would catch StepFunctions.AWSStepFunctionsException and retry a bunch of times with a reasonable backoff. But I don't necessarily want to catch all StepFunctions.AWSStepFunctionsExceptions; just the ones that get throttled. I can't figure out how to do that.

mbklein
asked 10 months ago918 views
1 Answer
0

Would increasing this quota solve most of this throttling issue? Quotas related to API action throttling

profile pictureAWS
EXPERT
kentrad
answered 10 months ago
  • I looked into a quota increase, but that seems a bit like a cop-out on our end, and also potentially unreliable. My experience has been that rate limiting and error handling are far more scalable than adding capacity.

    Rather than increasing the quota and hoping we don't exceed the new one, I'd much rather do some rate limiting and/or throttling retrying in the Step Function itself. Some (not all) of this can be achieved with batching when the Step Function is mapping over a list of inputs, but in this case, it's thousands of simultaneous executions processing one input each.

  • I was thinking that a quota increase would make the throttling exception an exception. It would still need to be handled but not something that was normal. Sorry, I didn't really answer your original question.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions