Runaway glue jobs leading to exception Task allocated capacity limit being exceeded ? "Glue ErrorCode 400 InvalidInputException"

0

My glue jobs assumed the default 48 hours timeout (which I was not aware of initially) and because they ended up in a delayed loop test for a specific file in a particular S3 bucket which never got created. So now when I run a simple basic Hello World type of glue job, it consistently fails with the following error

JobName:test and JobRunId:jr_6eb6af04d2a560f71d935ab3fca35504d7fdb99b748c0e0266e71402ced4437f_attempt_3 failed to execute with exception Task allocated capacity exceeded limit. (Service: AWSGlueJobExecutor; Status Code: 400; Error Code: InvalidInputException; Request ID: 7e43f436-4ca4-403e-a50f-8a15672ea2ef; Proxy: null)

I'm thinking this error is down to glue job tasks possibly still running and therefore the allocated capacity limit being exceeded, although I do not see any cloudwatch logs being updated now after 24 hours.

Questions:
1) Is this error, because the glue jobs are maybe still running in the background ?
2) Is there a way to list and kill these still running glue jobs to free up these resources? I have already tried with awscli aws glue batch-get-jobs --job-names ..., but no joy here of listing them.

I have now updated my glue job timeout to 60 minutes within my terraform code as a safeguard.

Any help or guidance will be appreciated, thank you.

已提问 2 年前717 查看次数
1 回答
1
已接受的回答

Hello,

The error you are getting for your job “test” with job run ID “jr_6eb6af04d2a560f71d935ab3fca35504d7fdb99b748c0e0266e71402ced4437f“ is indeed due to the throttling of the resources that are allocated for your account.

You can refer the below document for more detail on the default quota limits: https://docs.aws.amazon.com/general/latest/gr/glue.html

Please find below the responses for your questions:

Q1) Is this error, because the glue jobs are maybe still running in the background ?

Your assumption is correct. In case there are glue jobs running, the resources are allocated for that job and if you try to run another job in parallel and if there are not enough resources to satisfy the requirements the job will fail. This is due to the fact that the quota limits are set for the whole account.

Please refer to the below steps to increase the service quota limits from your console:

  1. Open Service Quotas console in AWS
  2. Click on AWS services from the left pane and search for Glue
  3. Click on AWS Glue and search for Quota name you want to increase
  4. Click on respective service quota and select "Request quota increase"
  5. Add new value under "Change quota value" which will be auto approved.

Q2) Is there a way to list and kill these still running glue jobs to free up these resources? I have already tried with awscli aws glue batch-get-jobs --job-names ..., but no joy here of listing them.

To view the metadata for all runs of given job you can use “get-job-runs”. Usage: aws glue get-job-runs --job-name “test”

Please refer to https://docs.aws.amazon.com/cli/latest/reference/glue/get-job-runs.html for more details and usage.

To stop one or more job runs for a specified job you can use the “batch-stop-job-run” cli command. Usage: aws glue batch-stop-job-run --job-name “test” --job-run-ids “jrxxxxxxx”

Please refer to https://docs.aws.amazon.com/cli/latest/reference/glue/batch-stop-job-run.html for more details and usage.

AWS
Ankur_J
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则