By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon Elastic MapReduce

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Executing hive create table in Spark.sql -- java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

1. The Issue: We have a Spark EMR cluster which connects to a remote hive metastore to use our emr hive data warehouse. When executing Pyspark statement in Zeppelin notebook: sc.sql("create table userdb_emr_search.test_table (id int, attr string)") Got this exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found 2. EMR Spark cluster configures: Release label:emr-6.3.0 Hadoop distribution:Amazon 3.2.1 Applications:Spark 3.1.1, JupyterHub 1.2.0, Ganglia 3.7.2, Zeppelin 0.9.0 3. The class org.apache.hadoop.fs.s3a.S3AFileSystem has its ClassPath on spark class path correctly: 'spark.executor.extraClassPath', '....:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:.... 'spark.driver.extraClassPath', '....:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:.... 4. Jar files are in the right places: Under /usr/lib/hadoop: -rw-r--r-- 1 root root 501704 Mar 30 2021 hadoop-aws-3.2.1-amzn-3.jar lrwxrwxrwx 1 root root 27 Sep 8 01:50 hadoop-aws.jar -> hadoop-aws-3.2.1-amzn-3.jar -rw-r--r-- 1 root root 4175105 Mar 30 2021 hadoop-common-3.2.1-amzn-3.jar lrwxrwxrwx 1 root root 30 Sep 8 01:50 hadoop-common.jar -> hadoop-common-3.2.1-amzn-3.jar Under /usr/share/aws/aws-java-sdk/: -rw-r--r-- 1 root root 216879203 Apr 1 2021 aws-java-sdk-bundle-1.11.977.jar 5. Hadoop storage: Use Amazon S3 for Hadoop storage instead of HDFS 6. Error log when executing spark sql create table in Zeppelin notebook: WARN [2022-09-05 03:24:11,785] ({SchedulerFactory3} NotebookServer.java[onStatusChange]:1928) - Job paragraph_1662330571651_66787638 is finished, status: ERROR, exception: null, result: %text Fail to execute line 2: sc.sql("create table userdb_emr_search.test_table (id int, attr string)") Traceback (most recent call last): File "/tmp/1662348163304-0/zeppelin_python.py", line 158, in <module> exec(code, _zcUserQueryNameSpace) File "<stdin>", line 2, in <module> File "/usr/lib/spark/python/pyspark/sql/session.py", line 723, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/lib/spark/python/pyspark/sql/utils.py", line 117, in deco raise converted from None pyspark.sql.utils.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found) INFO [2022-09-05 03:24:11,785] ({SchedulerFactory3} VFSNotebookRepo.java[save]:144) - Saving note 2HDK22P2Z to Untitled Note 1_2HDK22P2Z.zpln Please help investigate why spark sql cannot see the class org.apache.hadoop.fs.s3a.S3AFileSystem even its jar files are in right place and have correct ClassPath.
1
answers
0
votes
41
views
asked 18 days ago

Why does running a custom JAR on AWS EMR give a file system error - error 2 No such file or directory

I'm trying to setup a jupyterhub environment in AWS EMR. I've been following the instructions on the [documentation](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub.html) without issue. I now want to add a step during set up to add users. The documentation gives a snippet on how to do so. > **Example: Bash script to add multiple users** > > The following sample bash script ties together the previous steps in > this section to create multiple JupyterHub users. The script can be > run directly on the main node, or it can be uploaded to Amazon S3 > and then run as a step. > > > # Bulk add users to container and JupyterHub with temp password of username > set -x > USERS=(shirley diego ana richard li john mary anaya) > TOKEN=$(sudo docker exec jupyterhub /opt/conda/bin/jupyterhub token jovyan | tail -1) > for i in "${USERS[@]}"; > do > sudo docker exec jupyterhub useradd -m -s /bin/bash -N $i > sudo docker exec jupyterhub bash -c "echo $i:$i | chpasswd" > curl -XPOST --silent -k https://$(hostname):9443/hub/api/users/$i \ > -H "Authorization: token $TOKEN" | jq > done > > Save the script to a location in Amazon S3 such as > s3://mybucket/createjupyterusers.sh. Then you can use > script-runner.jar to run it as a step. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub-pam-users.html After following the procedure above I successfully launched the EMR cluster. I now want to use my own script in place of the shell script above but I'm having issues running it. I get the following error: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/tez/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Cannot run program "/mnt/var/lib/hadoop/steps/s-23QLFU7JXPPM7/./add_users_ERM.sh" (in directory "."): error=2, No such file or directory at com.amazon.elasticmapreduce.scriptrunner.ProcessRunner.exec(ProcessRunner.java:143) at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.main(ScriptRunner.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236) Caused by: java.io.IOException: Cannot run program "/mnt/var/lib/hadoop/steps/s-23QLFU7JXPPM7/./add_users_ERM_Linuxx.sh" (in directory "."): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at com.amazon.elasticmapreduce.scriptrunner.ProcessRunner.exec(ProcessRunner.java:96) ... 7 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 8 more *Could some explain why running script-runner.jar to run my shell fails but works fine when using the shell in the tutorial?* ---------- **add_users_ERM.sh for reference:** `#!/opt/conda/bin/python` import os import subprocess import traceback import sys TOKEN="$(sudo docker exec jupyterhub /opt/conda/bin/jupyterhub token jovyan | tail -1)" def users_from_text(file): #Now we have users and their team, we can create a user account and assign them to a team for user in file: username = user print(f"Adding {user}") cmd = ["sudo", "docker", "exec", "jupyterhub","useradd", "-m" ,"-s" ,"/bin/bash" ,"-N" ,username] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) output, error = p.communicate() output = output.strip().decode("utf-8") error = error.decode("utf-8") if p.returncode != 0: print(f"Error adding user: {error}") else: print(F"{user} was added") cmd = ["sudo", "docker", "exec", "jupyterhub","bash", "-c" ,f"echo {username}:{username} | chpasswd"] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) output, error = p.communicate() output = output.strip().decode("utf-8") error = error.decode("utf-8") if p.returncode != 0: print(f"Error adding password: {error}") else: print(F"{user} password was added") cmd = ["curl", "-XPOST", "--silent", "-k",f"https://$(hostname):9443/hub/api/users/{username}", "-H" ,f"Authorization: token {TOKEN}", "|", "jq"] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) output, error = p.communicate() output = output.strip().decode("utf-8") error = error.decode("utf-8") if p.returncode != 0: print(f"Error adding user to JH: {error}") else: print(F"{user} was added to JH") #To do: Convert api call to subprocess request return output test_data = ["worker_1", "worker_2", "worker_3", "worker_4", "worker_5", "worker_6"] txt_file = test_data print("Attempting add_user.sh script") output = users_from_text(txt_file)`
0
answers
0
votes
42
views
asked 2 months ago