AWS Glue Streaming 4.0 Docker Image - issues with missing class

0

I want to run my Glue Streaming job locally on Docker container (amazon/aws-glue-streaming-libs:glue_streaming_libs_4.0.0_image_01) to better troubleshoot memory issues, but I encountered this issue when the job tried to access S3 for checkpointing: org.apache.spark.util.TaskCompletionListenerException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3.EMRFSDelegate not found The same code works perfectly when run on Glue. It also works on Docker container if I use: s3a://bucket_name/path/in/the/bucket/ instead of s3://bucket_name/path/in/the/bucket/ but I guess S3A is not a preferred way to access S3 from Glue.

Is there something I am missing in my local configuration? Are some additional JARs needed for this?

1 Antwort
0

Yes, it sounds that it's missing the EMRFS library. It's completely fine to use s3a.

profile pictureAWS
EXPERTE
beantwortet vor 7 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen