AWS Glue Streaming 4.0 Docker Image - issues with missing class

0

I want to run my Glue Streaming job locally on Docker container (amazon/aws-glue-streaming-libs:glue_streaming_libs_4.0.0_image_01) to better troubleshoot memory issues, but I encountered this issue when the job tried to access S3 for checkpointing: org.apache.spark.util.TaskCompletionListenerException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3.EMRFSDelegate not found The same code works perfectly when run on Glue. It also works on Docker container if I use: s3a://bucket_name/path/in/the/bucket/ instead of s3://bucket_name/path/in/the/bucket/ but I guess S3A is not a preferred way to access S3 from Glue.

Is there something I am missing in my local configuration? Are some additional JARs needed for this?

1 Answer
0

Yes, it sounds that it's missing the EMRFS library. It's completely fine to use s3a.

profile pictureAWS
EXPERT
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions