- Newest
- Most votes
- Most comments
This is a known issue and was Fixed in EMR 6.7.0 and above.
Root cause :
In EMR-6.6.0, Spark was upgraded to 3.2.0 which includes SPARK-33212, a change that makes Spark depend on shaded Hadoop client jars (hadoop-client-api and hadoop-client-runtime) instead of the unshaded ones. The difference between the jars is that the third-party classes in the shaded jars were relocated to different packages. However, S3A classes were built assuming that the unshaded client jars are used (i.e. third-party classes aren’t relocated).
When Spark creates an S3AFileSystem, the S3AFileSystem.create() method attempts to invoke the constructor of a Hadoop utility class (SemaphoredDelegatingExecutor) while incorrectly assuming it accepts the unshaded version of a Guava class [1]. The actual parameter type of this constructor in Spark turns out to be the shaded version of this class [2]. This causes the JVM to throw the NoSuchMethodError, causing customer’s application to fail.
This issue was fixed in Hadoop 3.2.2 by HADOOP-16080, but Spark in EMR-6.6.0 uses Hadoop 3.2.1 without this fix backported.
[1] com.google.common.util.concurrent.ListeningExecutorService
[2] org.apache.hadoop.shaded.com.google.common.util.concurrent.ListeningExecutorService
If upgrading to EMR 6.7 is not an option, please reach out to AWS Support.
Relevant content
- Accepted Answerasked 2 years ago
- asked 5 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago