Glue Throttling Exception when starting > 15 Glue jobs in Parallel via Step Function

0

We are using Step Functions for our ETL pipeline. The first step kicks off 21 jobs that each take about 1-3 minutes each consuming 5 DPUs. The Step Function fails with the below error when trying to run more than 15 Glue Jobs in parallel. We are using the arn:aws:states:::glue:startJobRun.sync task to invoke the jobs synchronously. Is there a quota I need to ask for an increase on? Kicking off 21 jobs in parallel seems pretty reasonable.

{ "resourceType": "glue", "resource": "startJobRun.sync", "error": "Glue.AWSGlueException", "cause": "Rate exceeded (Service: AWSGlue; Status Code: 400; Error Code: ThrottlingException; Proxy: null)" }

  • Hello tjtoll, would you mind to post the code snippet, which solves the issue for your case?

    Would be highly appreciate, thank you very much!

    Kind regards, Armin

tjtoll
asked 2 years ago4899 views
1 Answer
2
Accepted Answer

Hi! Good question.

For General Glue Service Quotas (Limits), please see here: https://docs.aws.amazon.com/general/latest/gr/glue.html

Default Glue Quotas include things like:

  • Max concurrent job runs per account (50)
  • Max jobs per trigger (50)

To increase those and other limits, you can open a Service Quota Increase Request.

For Throttling Exceptions (https://docs.aws.amazon.com/glue/latest/webapi/CommonErrors.html), I'm not sure on the exact limit where API calls will get limited - if that's the case, you may need to use exponential backoff to retry (I've seen this for other API calls): https://docs.aws.amazon.com/general/latest/gr/api-retries.html

jsonc
answered 2 years ago
AWS
EXPERT
reviewed 2 years ago
  • Thanks. I ended up using the retry/back off options on each of those jobs to have them retry after 5 seconds.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions