Runaway glue jobs leading to exception Task allocated capacity limit being exceeded ? "Glue ErrorCode 400 InvalidInputException"

0

My glue jobs assumed the default 48 hours timeout (which I was not aware of initially) and because they ended up in a delayed loop test for a specific file in a particular S3 bucket which never got created. So now when I run a simple basic Hello World type of glue job, it consistently fails with the following error

JobName:test and JobRunId:jr_6eb6af04d2a560f71d935ab3fca35504d7fdb99b748c0e0266e71402ced4437f_attempt_3 failed to execute with exception Task allocated capacity exceeded limit. (Service: AWSGlueJobExecutor; Status Code: 400; Error Code: InvalidInputException; Request ID: 7e43f436-4ca4-403e-a50f-8a15672ea2ef; Proxy: null)

I'm thinking this error is down to glue job tasks possibly still running and therefore the allocated capacity limit being exceeded, although I do not see any cloudwatch logs being updated now after 24 hours.

Questions:
1) Is this error, because the glue jobs are maybe still running in the background ?
2) Is there a way to list and kill these still running glue jobs to free up these resources? I have already tried with awscli aws glue batch-get-jobs --job-names ..., but no joy here of listing them.

I have now updated my glue job timeout to 60 minutes within my terraform code as a safeguard.

Any help or guidance will be appreciated, thank you.

asked 2 years ago707 views
1 Answer
1
Accepted Answer

Hello,

The error you are getting for your job “test” with job run ID “jr_6eb6af04d2a560f71d935ab3fca35504d7fdb99b748c0e0266e71402ced4437f“ is indeed due to the throttling of the resources that are allocated for your account.

You can refer the below document for more detail on the default quota limits: https://docs.aws.amazon.com/general/latest/gr/glue.html

Please find below the responses for your questions:

Q1) Is this error, because the glue jobs are maybe still running in the background ?

Your assumption is correct. In case there are glue jobs running, the resources are allocated for that job and if you try to run another job in parallel and if there are not enough resources to satisfy the requirements the job will fail. This is due to the fact that the quota limits are set for the whole account.

Please refer to the below steps to increase the service quota limits from your console:

  1. Open Service Quotas console in AWS
  2. Click on AWS services from the left pane and search for Glue
  3. Click on AWS Glue and search for Quota name you want to increase
  4. Click on respective service quota and select "Request quota increase"
  5. Add new value under "Change quota value" which will be auto approved.

Q2) Is there a way to list and kill these still running glue jobs to free up these resources? I have already tried with awscli aws glue batch-get-jobs --job-names ..., but no joy here of listing them.

To view the metadata for all runs of given job you can use “get-job-runs”. Usage: aws glue get-job-runs --job-name “test”

Please refer to https://docs.aws.amazon.com/cli/latest/reference/glue/get-job-runs.html for more details and usage.

To stop one or more job runs for a specified job you can use the “batch-stop-job-run” cli command. Usage: aws glue batch-stop-job-run --job-name “test” --job-run-ids “jrxxxxxxx”

Please refer to https://docs.aws.amazon.com/cli/latest/reference/glue/batch-stop-job-run.html for more details and usage.

AWS
Ankur_J
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions