1 Answer
- Newest
- Most votes
- Most comments
1
It looks like you're having some trouble with the S3 shuffle storage plugin for Glue in your EMR Serverless setup with Spark 3.4
. It seems that the plugin might not be fully compatible with Spark 3.4
, even though it worked with Spark 3.3
.
You may want to try these suggestions to see if they help resolve the issue:
- If possible, consider downgrading to Spark
3.3
to ensure compatibility with the plugin. However, this might not be feasible if you require features exclusive to Spark3.4
or if you need interactive endpoints available in EMR Serverless. - If downgrading is not an option, you might need to implement a custom version of the plugin that is compatible with Spark 3.4. This could involve modifying the source code of the plugin to add the missing method implementation.
Unfortunately, I don't guarantee this solution, but exploring these options might help you find a way to get S3 shuffle storage working in your EMR Serverless environment.
Relevant content
- asked 2 years ago
- asked a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 7 months ago
Unfortunately the plug-in is made by Amazon and is closed source so no way to fix it myself.
I am not even sure if it will work with Spark 3.3 as 3.3 has the same method that fails. It exists since Spark 3.0. I think there must be some difference between the Spark running in Glue vs the Spark in EMR Serverless, that makes the plug-in work for Glue only.