- Más nuevo
- Más votos
- Más comentarios
I changed the worker_type from G.1X to G.2X and the job completed successfully, albeit in 38 hours. So then I tuned the Spark code so that the 3 Dataframes are all partitioned on the same value .repartition("attribute_name")
, and also doubled the number of workers from 5 to 10. Then the job completed successfully in 1 hr 20 mins. The partitioning helped the JOIN that was being done to create the final dataset being written to s3.
Hi,
have you looked at the documentation about migrating Glue from version 2.0 to 3.0 ? and Spark 2 to Spark 3?
Do you use external libraries?
Contacting AWS Support might be the fastest way to resolve your issue if you cannot find any indication in the documentation shared, without seeing the job itself it is difficult to provide more prescriptive guidance.
hope this helps
Contenido relevante
- ¿Cómo puedo resolver el error «No queda espacio en el dispositivo» en un trabajo de ETL de AWS Glue?OFICIAL DE AWSActualizada hace un año
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace 2 años
thank you for feedback. How long was it taking with Glue 2.0? were you using the same number of nodes?