AWS Glue Streaming 4.0 Docker Image - issues with missing class

0

I want to run my Glue Streaming job locally on Docker container (amazon/aws-glue-streaming-libs:glue_streaming_libs_4.0.0_image_01) to better troubleshoot memory issues, but I encountered this issue when the job tried to access S3 for checkpointing: org.apache.spark.util.TaskCompletionListenerException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3.EMRFSDelegate not found The same code works perfectly when run on Glue. It also works on Docker container if I use: s3a://bucket_name/path/in/the/bucket/ instead of s3://bucket_name/path/in/the/bucket/ but I guess S3A is not a preferred way to access S3 from Glue.

Is there something I am missing in my local configuration? Are some additional JARs needed for this?

1 Respuesta
0

Yes, it sounds that it's missing the EMRFS library. It's completely fine to use s3a.

profile pictureAWS
EXPERTO
respondido hace 7 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas