EMR bootstrap script with pip numpy installation fails on r6+ instances

0

I recently tested moving from r5 to r6 instance fleets for our PySpark script. It has a dependency on numpy and pandas that is installed via pip in a bootstrap script, along with a few other dependencies for communicating with s3:

#!/bin/bash -xe

echo "---------------------------------------------------------"

echo "using python version:"
python3 --version
echo "initial python packages (sudo python3 -m pip list):"
sudo python3 -m pip list

echo "---------------------------------------------------------"

echo "install python3-dev development tools"
sudo yum -y install python3-devel

echo "---------------------------------------------------------"

echo "installing python dependencies"
sudo python3 -m pip install -U pip
echo "pip installed/updated"
sudo python3 -m pip install -U setuptools
echo "setuptools installed"
sudo python3 -m pip install \
    cloudpickle==1.6.0 \
    boto3==1.21.7 \
    fsspec==2022.2.0 \
    s3fs==0.4.2
echo "aws dependencies installed (boto3, cloudpickle, fsspec, s3fs)"
# sudo python3 -m pip install \
#     pandas==1.1.5 \
#     numpy==1.16.5
# echo "pandas + numpy installed"
sudo python3 -m pip install pandas==1.2.5
echo "pandas installed"

echo "final python packages (sudo python3 -m pip list):"
sudo python3 -m pip list

This runs without failure on r5 instances, and numpy is available in the python environment as expected.

When allowing {r6, r6g) instance types, the bootstrap script fails with the following message:

      _configtest.c:1:10: fatal error: Python.h: No such file or directory
       #include <Python.h>
                ^~~~~~~~~~
      compilation terminated.
      failure.
      removing: _configtest.c _configtest.o
      Traceback (most recent call last):
        File "<string>", line 36, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/setup.py", line 419, in <module>
          setup_package()
        File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/setup.py", line 411, in setup_package
          setup(**metadata)
        File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/core.py", line 171, in setup
          return old_setup(**new_attr)
        File "/usr/local/lib/python3.7/site-packages/setuptools/__init__.py", line 155, in setup
          return distutils.core.setup(**attrs)
        File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 148, in setup
          return run_commands(dist)
        File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
          dist.run_commands()
        File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
          self.run_command(cmd)
        File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
          cmd_obj.run()
        File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/install.py", line 62, in run
          r = self.setuptools_run()
        File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/install.py", line 36, in setuptools_run
          return distutils_install.run(self)
        File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/command/install.py", line 670, in run
          self.run_command('build')
        File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
          cmd_obj.run()
        File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/build.py", line 47, in run
          old_build.run(self)
        File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
          cmd_obj.run()
        File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/build_src.py", line 148, in run
          self.build_sources()
        File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/build_src.py", line 165, in build_sources
          self.build_extension_sources(ext)
        File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/build_src.py", line 322, in build_extension_sources
          sources = self.generate_sources(sources, ext)
        File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/build_src.py", line 375, in generate_sources
          source = func(extension, build_dir)
        File "numpy/core/setup.py", line 423, in generate_config_h
          moredefs, ignored = cocache.check_types(config_cmd, ext, build_dir)
        File "numpy/core/setup.py", line 47, in check_types
          out = check_types(*a, **kw)
        File "numpy/core/setup.py", line 281, in check_types
          "install {0}-dev|{0}-devel.".format(python))
      SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> numpy

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Note, this bootstrap script attempts to address the problem from the error message by installing python-devel via yum before running the pip install.

gefragt vor 2 Jahren1066 Aufrufe
1 Antwort
0

I suspect you need arm64 version of numpy, probably might have to build it as wheel file and then add it to the EMR cluster

AWS
Alex_T
beantwortet vor 2 Jahren
  • ahhhhhhhhhh. good thinking, that makes sense. and, that solution is reasonable, but it also means that i would have to fully commit my instance fleet to the AMD architecture if it's coming in via my Spark --py-files (wheels) parameter.

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen