AWS Glue Streaming 4.0 Docker Image - issues with missing class

0

I want to run my Glue Streaming job locally on Docker container (amazon/aws-glue-streaming-libs:glue_streaming_libs_4.0.0_image_01) to better troubleshoot memory issues, but I encountered this issue when the job tried to access S3 for checkpointing: org.apache.spark.util.TaskCompletionListenerException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3.EMRFSDelegate not found The same code works perfectly when run on Glue. It also works on Docker container if I use: s3a://bucket_name/path/in/the/bucket/ instead of s3://bucket_name/path/in/the/bucket/ but I guess S3A is not a preferred way to access S3 from Glue.

Is there something I am missing in my local configuration? Are some additional JARs needed for this?

已提問 7 個月前檢視次數 266 次
1 個回答
0

Yes, it sounds that it's missing the EMRFS library. It's completely fine to use s3a.

profile pictureAWS
專家
已回答 7 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南