By using AWS re:Post, you agree to the Terms of Use

Glue job keeps running and does not write results

0

I have created a job to migrate my Postgres data to S3, I am implementing full load right now, the table consists of a ** lot of records**(count-17496724), that is why I added 10 workers along with auto scaling option checked, but i keep getting this error. The job continues to run for long and does not generate any output. Have tried other combination of worker numbers too like 5,10, but same error. Below are the errors from the Logs:

  1. ERROR [dispatcher-CoarseGrainedScheduler] scheduler.TaskSchedulerImpl (Logging.scala:logError(73)): Lost executor 1 on 10.0.4.209: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

  2. ERROR [dispatcher-CoarseGrainedScheduler] scheduler.TaskSchedulerImpl (Logging.scala:logError(73)): Lost executor 2 on 10.0.4.209: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

About the job: Creating JDBC connection to the Postgres database and writing the data into S3 as parquet files

How do I perform the load, I need to add the incremental logic later.

1 Answer
0

Hello,

Above error occurs when exceutor requires more momemory to process the data than configured. For G.1x worker Glue uses 10 GB and for G.2x it uses 20 GB. By default spark read operation from JDBC sources are not parallel and use one connection to read the entire data. To resolve this issue, read the JDBC table in parallel.

You can use hashexpression and hashfield along with hashpartition to read the data in parallel using Glue dynamic frame. Please refer to article for detail explanation:

https://aws.amazon.com/premiumsupport/knowledge-center/glue-lost-nodes-rds-s3-migration/

Regarding the incremental load, You can use Glue bookmark feature to read the data from JDBC sources. Please refer to article for the same as there are some prerequisite to use bookmark with JDBC sources.

https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

If you still face any issue, Please feel free to reach out to AWS Premium Support with script and jobrunid and we will be happy to help.

answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions