Does AWS Glue have an issue with multi-threaded loads

0

I'm experiencing a strange issue with AWS Glue. In order to speed up loads, I'm running multiple threads (spark.scheduler.mode=FAIR, Python multiprocessing.pool.ThreadPool(thread_count=5)). Each thread loads up a specific JDBC database table (glueContext.create_dynamic_frame.from_options(**options)and uses job bookmarks to handle deltas.

What is happening that each thread starts and logs the table they should be loading. After the log entry comes the create_dynamic_frame.from_options() and all loads seem to stop there. Nothing happens from there and ultimately the job timeouts. The next step would be to write the result in an S3 bucket but that is not happening. Sometimes, when the job is re-deployed or executed several times manually, it completes, but that's really rare. This seems like a race condition of some sort...

Does Glue have any limitations / issues in using Spark threading? Does anyone have a properly functioning JDBC load running in multiple threads?

por
posta 2 anni fa213 visualizzazioni
3 Risposte
0

Hi , if the issue seems limited to JDBC load, have you tried to monitor also the source database ?

Are you sure you are not experiencing queries timeouts while reading from the Database? Depending on the options you are using, and the number of concurrent loads (is it 5? or more?) you might be submitting more queries than you expect to the database and it might be starting to slowing down.

Try to monitor the jobs loking at the glue metrics and at Spark UI and at the same time monitor the DB you read from to understand where the slow down may be actually occurring.

Hope hit helps,

AWS
ESPERTO
con risposta 2 anni fa
  • Hi,

    Thanks for the tips! We've monitored the source database and there's nothing of significance there. In the beginning of the loads the queries execute and we can see them, but then, they stop. It looks like the database returns the result but Glue never processes the results.

    We're now in the process of dropping Glue and handling the bookmarking by ourselves.

0

I just realized we have working multi-threaded processing jobs. They load data from S3 and after transformations etc. dump the data back in S3 with new format. Those are running nicely, so the issue seems to be specific to JDBC loads.

por
con risposta 2 anni fa
0

Another note - this seems to be related to number of tables. The failing job loads around 850 tables. The same code works fine with loads containing around or less than 100 tables. This might support the race condition theory.

por
con risposta 2 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande