An error occurred while calling o352.pyWriteDynamicFrame. Job 1 cancelled because SparkContext was shut down caused by threshold for consecutive task creation reached

Hi, I have a Glue job script that ingests tables from a Postgres database to AWS Catalog database. Here are the steps of the ingestion:

Read postgres tables using SparkDF
Convert SparkDF to DynamicDF
Write the DF directly to a table using DynamicDF's sink.writeFrame()

We set the "Maximum concurrency" to 8 for this job. We have another Glue job running as a workflow which triggers this job 8 times to ingest 8 tables simultaneously with different parameters. The total number of DPUs of the 8 concurrent job runs is around 100. Sometimes, the jobs ran successfully. But sometimes, some of the jobs succeeded but some failed with the following error:

An error occurred while calling o352.pyWriteDynamicFrame. Job 1 cancelled because SparkContext was shut down caused by threshold for consecutive task creation reached

The above error message indicating the job failed while calling o352.pyWriteDynamicFrame. But it also happened while calling o93.purgeS3Path. So, I don't think it's related to a specific function in the job and I think it's more likely related to the job configs. I couldn't find any answer on this online. I also checked our service quota and don't think the jobs exceed any limitations, like the maximum number of concurrent running DPUs, maximum number of concurrent job runs, etc. Do you have any suggestions on why this happens and how to fix it? Should I set the "Maximum concurrency" to a higher number, like 16, for the job?

Themen

Analysen

Relevanter Inhalt

Wie kann ich die Menge der von meinem AWS Glue-Job generierten Protokolle reduzieren?
AWS OFFICIALAktualisiert vor 2 Jahren
Warum verarbeitet mein AWS Glue ETL-Job Daten erneut, obwohl Job-Lesezeichen aktiviert sind?
AWS OFFICIALAktualisiert vor einem Jahr
Wie behebe ich den Fehler „The specified queue does not exist or you do not have access to it.“, wenn ich einen AWS Glue Job ausführe, um Nachrichten an Amazon SQS in einer anderen Region zu senden?
AWS OFFICIALAktualisiert vor 3 Jahren
Wie behebe ich den Fehler „Fehlercode: 503 Slow Down" bei s3-dist-cp-Jobs in Amazon EMR?
AWS OFFICIALAktualisiert vor 2 Jahren