Catching throttling exceptions in a Step Function

0

I have an S3 Batch operation that calls a Lambda. That Lambda initiates a Step Function (A), which in turn calls another Step Function (B) and waits for its response before moving on.

The Lambda is handling rate limiting well – if it gets throttled, it returns a TemporaryFailure status and S3 Batch deals with it just fine. I've had no throttling-related failures on the batch/lambda side because of this.

Unfortunately Step Function A fails a good percentage of the time. The reason given is StepFunctions.AWSStepFunctionsException, with the message:

Rate exceeded (Service: AWSStepFunctions; Status Code: 400; Error Code: ThrottlingException; Request ID: [redacted]; Proxy: null)

To be clear, this isn't one step function execution trying to start several others – it's 15,000 separate executions of Step Function A (which all start normally) trying to start 15,000 executions of Step Function B. It's those StartExecution states that are failing.

I suppose I could add a Retrier to my StartExecution state in Step Function A that would catch StepFunctions.AWSStepFunctionsException and retry a bunch of times with a reasonable backoff. But I don't necessarily want to catch all StepFunctions.AWSStepFunctionsExceptions; just the ones that get throttled. I can't figure out how to do that.

mbklein
posta un anno fa983 visualizzazioni
1 Risposta
0

Would increasing this quota solve most of this throttling issue? Quotas related to API action throttling

profile pictureAWS
ESPERTO
kentrad
con risposta un anno fa
  • I looked into a quota increase, but that seems a bit like a cop-out on our end, and also potentially unreliable. My experience has been that rate limiting and error handling are far more scalable than adding capacity.

    Rather than increasing the quota and hoping we don't exceed the new one, I'd much rather do some rate limiting and/or throttling retrying in the Step Function itself. Some (not all) of this can be achieved with batching when the Step Function is mapping over a list of inputs, but in this case, it's thousands of simultaneous executions processing one input each.

  • I was thinking that a quota increase would make the throttling exception an exception. It would still need to be handled but not something that was normal. Sorry, I didn't really answer your original question.

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande