By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon Elastic MapReduce

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

bootstrap failure due to requirement of arm64 version of numpy required for r6 instances?

Was trying to upgrade to the latest r6 instances from r5s and ran into an issue with installing numpy in our bootstrap script via pip. Found[ this post](https://repost.aws/questions/QUdF4dL0k9RTeAZaUFiPDJCw/emr-bootstrap-script-with-pip-numpy-installation-fails-on-r-6-instances) that is related to my issue. Was anyone able to resolve this without building your own wheel file of the arm64 version of numpy? EC2/EMR Cluster Config ``` Release label: emr-6.5.0 Instance Type: r6gd.8xlarge ``` Snippet of the bootstrap ``` #!/bin/bash # python version pyv="$(python3 -V 2>&1)" echo "Python version: $pyv" # misc code to link up the requirements.txt echo "`date -u` install python dependencies" #Install Python deps sudo python3 -m pip install wheel sudo python3 -m pip install -r requirements.txt ``` requirements.txt ``` boto3==1.18.46 Cython==0.29.24 pandas==1.3.3 numpy==1.21.2 ``` Log Output ``` + echo 'Python version: Python 3.7.10' ... + echo 'Thu Sep 29 22:16:46 UTC 2022 install python dependencies' + sudo python3 -m pip install wheel WARNING: Running pip install with root privileges is generally not a good idea. Try `python3 -m pip install --user` instead. WARNING: The script wheel is installed in '/usr/local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. + sudo python3 -m pip install -r job-requirements.txt WARNING: Running pip install with root privileges is generally not a good idea. Try `python3 -m pip install --user` instead. ERROR: Command errored out with exit status 1: command: /bin/python3 /usr/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /mnt/tmp/pip-build-env-yy928eo_/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'cython >= 0.29' 'numpy==1.14.5; python_version<'"'"'3.7'"'"'' 'numpy==1.16.0; python_version>='"'"'3.7'"'"'' setuptools setuptools_scm wheel cwd: None Complete output (866 lines): WARNING: Running pip install with root privileges is generally not a good idea. Try `pip install --user` instead. Ignoring numpy: markers 'python_version < "3.7"' don't match your environment Collecting cython>=0.29 Using cached Cython-0.29.32-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_24_aarch64.whl (1.8 MB) Collecting numpy==1.16.0 Downloading numpy-1.16.0.zip (5.1 MB) Collecting setuptools Downloading setuptools-65.4.0-py3-none-any.whl (1.2 MB) Collecting setuptools_scm Downloading setuptools_scm-7.0.5-py3-none-any.whl (42 kB) Collecting wheel Using cached wheel-0.37.1-py2.py3-none-any.whl (35 kB) Collecting packaging>=20.0 Downloading packaging-21.3-py3-none-any.whl (40 kB) Collecting tomli>=1.0.0 Downloading tomli-2.0.1-py3-none-any.whl (12 kB) Collecting typing-extensions Downloading typing_extensions-4.3.0-py3-none-any.whl (25 kB) Collecting importlib-metadata; python_version < "3.8" Downloading importlib_metadata-4.12.0-py3-none-any.whl (21 kB) Collecting pyparsing!=3.0.5,>=2.0.2 Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB) Collecting zipp>=0.5 Downloading zipp-3.8.1-py3-none-any.whl (5.6 kB) ... _configtest.c:1:10: fatal error: Python.h: No such file or directory #include <Python.h> ^~~~~~~~~~ compilation terminated. failure. removing: _configtest.c _configtest.o Traceback (most recent call last): File "<string>", line 1, in <module> File "/mnt/tmp/pip-install-tl9eju6y/numpy/setup.py", line 415, in <module> setup_package() File "/mnt/tmp/pip-install-tl9eju6y/numpy/setup.py", line 407, in setup_package setup(**metadata) File "/mnt/tmp/pip-install-tl9eju6y/numpy/numpy/distutils/core.py", line 171, in setup return old_setup(**new_attr) File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 165, in setup return distutils.core.setup(**attrs) File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/mnt/tmp/pip-install-tl9eju6y/numpy/numpy/distutils/command/install.py", line 62, in run r = self.setuptools_run() File "/mnt/tmp/pip-install-tl9eju6y/numpy/numpy/distutils/command/install.py", line 36, in setuptools_run return distutils_install.run(self) File "/usr/lib64/python3.7/distutils/command/install.py", line 556, in run self.run_command('build') File "/usr/lib64/python3.7/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/mnt/tmp/pip-install-tl9eju6y/numpy/numpy/distutils/command/build.py", line 47, in run old_build.run(self) File "/usr/lib64/python3.7/distutils/command/build.py", line 135, in run self.run_command(cmd_name) File "/usr/lib64/python3.7/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/mnt/tmp/pip-install-tl9eju6y/numpy/numpy/distutils/command/build_src.py", line 148, in run self.build_sources() File "/mnt/tmp/pip-install-tl9eju6y/numpy/numpy/distutils/command/build_src.py", line 165, in build_sources self.build_extension_sources(ext) File "/mnt/tmp/pip-install-tl9eju6y/numpy/numpy/distutils/command/build_src.py", line 322, in build_extension_sources sources = self.generate_sources(sources, ext) File "/mnt/tmp/pip-install-tl9eju6y/numpy/numpy/distutils/command/build_src.py", line 375, in generate_sources source = func(extension, build_dir) File "numpy/core/setup.py", line 423, in generate_config_h moredefs, ignored = cocache.check_types(config_cmd, ext, build_dir) File "numpy/core/setup.py", line 47, in check_types out = check_types(*a, **kw) File "numpy/core/setup.py", line 281, in check_types "install {0}-dev|{0}-devel.".format(python)) SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel. ---------------------------------------- ERROR: Command errored out with exit status 1: /bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/mnt/tmp/pip-install-tl9eju6y/numpy/setup.py'"'"'; __file__='"'"'/mnt/tmp/pip-install-tl9eju6y/numpy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /mnt/tmp/pip-record-paofd9vx/install-record.txt --single-version-externally-managed --prefix /mnt/tmp/pip-build-env-yy928eo_/overlay --compile --install-headers /mnt/tmp/pip-build-env-yy928eo_/overlay/include/python3.7m/numpy Check the logs for full command output. ---------------------------------------- ERROR: Command errored out with exit status 1: /bin/python3 /usr/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /mnt/tmp/pip-build-env-yy928eo_/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'cython >= 0.29' 'numpy==1.14.5; python_version<'"'"'3.7'"'"'' 'numpy==1.16.0; python_version>='"'"'3.7'"'"'' setuptools setuptools_scm wheel Check the logs for full command output. ```
1
answers
0
votes
21
views
asked 6 days ago

Executing hive create table in Spark.sql -- java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

1. The Issue: We have a Spark EMR cluster which connects to a remote hive metastore to use our emr hive data warehouse. When executing Pyspark statement in Zeppelin notebook: sc.sql("create table userdb_emr_search.test_table (id int, attr string)") Got this exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found 2. EMR Spark cluster configures: Release label:emr-6.3.0 Hadoop distribution:Amazon 3.2.1 Applications:Spark 3.1.1, JupyterHub 1.2.0, Ganglia 3.7.2, Zeppelin 0.9.0 3. The class org.apache.hadoop.fs.s3a.S3AFileSystem has its ClassPath on spark class path correctly: 'spark.executor.extraClassPath', '....:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:.... 'spark.driver.extraClassPath', '....:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:.... 4. Jar files are in the right places: Under /usr/lib/hadoop: -rw-r--r-- 1 root root 501704 Mar 30 2021 hadoop-aws-3.2.1-amzn-3.jar lrwxrwxrwx 1 root root 27 Sep 8 01:50 hadoop-aws.jar -> hadoop-aws-3.2.1-amzn-3.jar -rw-r--r-- 1 root root 4175105 Mar 30 2021 hadoop-common-3.2.1-amzn-3.jar lrwxrwxrwx 1 root root 30 Sep 8 01:50 hadoop-common.jar -> hadoop-common-3.2.1-amzn-3.jar Under /usr/share/aws/aws-java-sdk/: -rw-r--r-- 1 root root 216879203 Apr 1 2021 aws-java-sdk-bundle-1.11.977.jar 5. Hadoop storage: Use Amazon S3 for Hadoop storage instead of HDFS 6. Error log when executing spark sql create table in Zeppelin notebook: WARN [2022-09-05 03:24:11,785] ({SchedulerFactory3} NotebookServer.java[onStatusChange]:1928) - Job paragraph_1662330571651_66787638 is finished, status: ERROR, exception: null, result: %text Fail to execute line 2: sc.sql("create table userdb_emr_search.test_table (id int, attr string)") Traceback (most recent call last): File "/tmp/1662348163304-0/zeppelin_python.py", line 158, in <module> exec(code, _zcUserQueryNameSpace) File "<stdin>", line 2, in <module> File "/usr/lib/spark/python/pyspark/sql/session.py", line 723, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/lib/spark/python/pyspark/sql/utils.py", line 117, in deco raise converted from None pyspark.sql.utils.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found) INFO [2022-09-05 03:24:11,785] ({SchedulerFactory3} VFSNotebookRepo.java[save]:144) - Saving note 2HDK22P2Z to Untitled Note 1_2HDK22P2Z.zpln Please help investigate why spark sql cannot see the class org.apache.hadoop.fs.s3a.S3AFileSystem even its jar files are in right place and have correct ClassPath.
1
answers
0
votes
45
views
asked a month ago

Why does running a custom JAR on AWS EMR give a file system error - error 2 No such file or directory

I'm trying to setup a jupyterhub environment in AWS EMR. I've been following the instructions on the [documentation](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub.html) without issue. I now want to add a step during set up to add users. The documentation gives a snippet on how to do so. > **Example: Bash script to add multiple users** > > The following sample bash script ties together the previous steps in > this section to create multiple JupyterHub users. The script can be > run directly on the main node, or it can be uploaded to Amazon S3 > and then run as a step. > > > # Bulk add users to container and JupyterHub with temp password of username > set -x > USERS=(shirley diego ana richard li john mary anaya) > TOKEN=$(sudo docker exec jupyterhub /opt/conda/bin/jupyterhub token jovyan | tail -1) > for i in "${USERS[@]}"; > do > sudo docker exec jupyterhub useradd -m -s /bin/bash -N $i > sudo docker exec jupyterhub bash -c "echo $i:$i | chpasswd" > curl -XPOST --silent -k https://$(hostname):9443/hub/api/users/$i \ > -H "Authorization: token $TOKEN" | jq > done > > Save the script to a location in Amazon S3 such as > s3://mybucket/createjupyterusers.sh. Then you can use > script-runner.jar to run it as a step. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub-pam-users.html After following the procedure above I successfully launched the EMR cluster. I now want to use my own script in place of the shell script above but I'm having issues running it. I get the following error: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/tez/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Cannot run program "/mnt/var/lib/hadoop/steps/s-23QLFU7JXPPM7/./add_users_ERM.sh" (in directory "."): error=2, No such file or directory at com.amazon.elasticmapreduce.scriptrunner.ProcessRunner.exec(ProcessRunner.java:143) at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.main(ScriptRunner.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236) Caused by: java.io.IOException: Cannot run program "/mnt/var/lib/hadoop/steps/s-23QLFU7JXPPM7/./add_users_ERM_Linuxx.sh" (in directory "."): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at com.amazon.elasticmapreduce.scriptrunner.ProcessRunner.exec(ProcessRunner.java:96) ... 7 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 8 more *Could some explain why running script-runner.jar to run my shell fails but works fine when using the shell in the tutorial?* ---------- **add_users_ERM.sh for reference:** `#!/opt/conda/bin/python` import os import subprocess import traceback import sys TOKEN="$(sudo docker exec jupyterhub /opt/conda/bin/jupyterhub token jovyan | tail -1)" def users_from_text(file): #Now we have users and their team, we can create a user account and assign them to a team for user in file: username = user print(f"Adding {user}") cmd = ["sudo", "docker", "exec", "jupyterhub","useradd", "-m" ,"-s" ,"/bin/bash" ,"-N" ,username] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) output, error = p.communicate() output = output.strip().decode("utf-8") error = error.decode("utf-8") if p.returncode != 0: print(f"Error adding user: {error}") else: print(F"{user} was added") cmd = ["sudo", "docker", "exec", "jupyterhub","bash", "-c" ,f"echo {username}:{username} | chpasswd"] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) output, error = p.communicate() output = output.strip().decode("utf-8") error = error.decode("utf-8") if p.returncode != 0: print(f"Error adding password: {error}") else: print(F"{user} password was added") cmd = ["curl", "-XPOST", "--silent", "-k",f"https://$(hostname):9443/hub/api/users/{username}", "-H" ,f"Authorization: token {TOKEN}", "|", "jq"] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) output, error = p.communicate() output = output.strip().decode("utf-8") error = error.decode("utf-8") if p.returncode != 0: print(f"Error adding user to JH: {error}") else: print(F"{user} was added to JH") #To do: Convert api call to subprocess request return output test_data = ["worker_1", "worker_2", "worker_3", "worker_4", "worker_5", "worker_6"] txt_file = test_data print("Attempting add_user.sh script") output = users_from_text(txt_file)`
0
answers
0
votes
44
views
asked 2 months ago