I want to run my Glue Streaming job locally on Docker container (amazon/aws-glue-streaming-libs:glue_streaming_libs_4.0.0_image_01) to better troubleshoot memory issues, but I encountered this issue when the job tried to access S3 for checkpointing:
org.apache.spark.util.TaskCompletionListenerException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3.EMRFSDelegate not found
The same code works perfectly when run on Glue. It also works on Docker container if I use:
s3a://bucket_name/path/in/the/bucket/
instead of
s3://bucket_name/path/in/the/bucket/
but I guess S3A is not a preferred way to access S3 from Glue.
Is there something I am missing in my local configuration? Are some additional JARs needed for this?