Glue Throttling Exception when starting > 15 Glue jobs in Parallel via Step Function

0

We are using Step Functions for our ETL pipeline. The first step kicks off 21 jobs that each take about 1-3 minutes each consuming 5 DPUs. The Step Function fails with the below error when trying to run more than 15 Glue Jobs in parallel. We are using the arn:aws:states:::glue:startJobRun.sync task to invoke the jobs synchronously. Is there a quota I need to ask for an increase on? Kicking off 21 jobs in parallel seems pretty reasonable.

{ "resourceType": "glue", "resource": "startJobRun.sync", "error": "Glue.AWSGlueException", "cause": "Rate exceeded (Service: AWSGlue; Status Code: 400; Error Code: ThrottlingException; Proxy: null)" }

  • Hello tjtoll, would you mind to post the code snippet, which solves the issue for your case?

    Would be highly appreciate, thank you very much!

    Kind regards, Armin

tjtoll
질문됨 2년 전4927회 조회
1개 답변
2
수락된 답변

Hi! Good question.

For General Glue Service Quotas (Limits), please see here: https://docs.aws.amazon.com/general/latest/gr/glue.html

Default Glue Quotas include things like:

  • Max concurrent job runs per account (50)
  • Max jobs per trigger (50)

To increase those and other limits, you can open a Service Quota Increase Request.

For Throttling Exceptions (https://docs.aws.amazon.com/glue/latest/webapi/CommonErrors.html), I'm not sure on the exact limit where API calls will get limited - if that's the case, you may need to use exponential backoff to retry (I've seen this for other API calls): https://docs.aws.amazon.com/general/latest/gr/api-retries.html

jsonc
답변함 2년 전
AWS
전문가
검토됨 2년 전
  • Thanks. I ended up using the retry/back off options on each of those jobs to have them retry after 5 seconds.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인