AWS Glue Streaming 4.0 Docker Image - issues with missing class

0

I want to run my Glue Streaming job locally on Docker container (amazon/aws-glue-streaming-libs:glue_streaming_libs_4.0.0_image_01) to better troubleshoot memory issues, but I encountered this issue when the job tried to access S3 for checkpointing: org.apache.spark.util.TaskCompletionListenerException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3.EMRFSDelegate not found The same code works perfectly when run on Glue. It also works on Docker container if I use: s3a://bucket_name/path/in/the/bucket/ instead of s3://bucket_name/path/in/the/bucket/ but I guess S3A is not a preferred way to access S3 from Glue.

Is there something I am missing in my local configuration? Are some additional JARs needed for this?

1개 답변
0

Yes, it sounds that it's missing the EMRFS library. It's completely fine to use s3a.

profile pictureAWS
전문가
답변함 7달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인