Error when save using file system s3a in EMR 6.6.0 with spark 3.2

4

We are trying to save data using s3a and we are getting this error, but when we save the data as s3 it works fine. Below you can find the stacktrace.

Closing SparkContext Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:813) at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1235) at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:100) at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:605) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:696) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:692) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.create(FileContext.java:698) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:327) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:140) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:143) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createAtomic(CheckpointFileManager.scala:333) at org.apache.spark.sql.execution.streaming.StreamMetadata$.write(StreamMetadata.scala:79) at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$streamMetadata$1(StreamExecution.scala:141) at scala.Option.getOrElse(Option.scala:189)

asked 2 years ago806 views
1 Answer
0

This is a known issue and was Fixed in EMR 6.7.0 and above.

Root cause :

In EMR-6.6.0, Spark was upgraded to 3.2.0 which includes SPARK-33212, a change that makes Spark depend on shaded Hadoop client jars (hadoop-client-api and hadoop-client-runtime) instead of the unshaded ones. The difference between the jars is that the third-party classes in the shaded jars were relocated to different packages. However, S3A classes were built assuming that the unshaded client jars are used (i.e. third-party classes aren’t relocated).

When Spark creates an S3AFileSystem, the S3AFileSystem.create() method attempts to invoke the constructor of a Hadoop utility class (SemaphoredDelegatingExecutor) while incorrectly assuming it accepts the unshaded version of a Guava class [1]. The actual parameter type of this constructor in Spark turns out to be the shaded version of this class [2]. This causes the JVM to throw the NoSuchMethodError, causing customer’s application to fail.

This issue was fixed in Hadoop 3.2.2 by HADOOP-16080, but Spark in EMR-6.6.0 uses Hadoop 3.2.1 without this fix backported.

[1] com.google.common.util.concurrent.ListeningExecutorService

[2] org.apache.hadoop.shaded.com.google.common.util.concurrent.ListeningExecutorService

If upgrading to EMR 6.7 is not an option, please reach out to AWS Support.

profile pictureAWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions