I recently tested moving from r5 to r6 instance fleets for our PySpark script. It has a dependency on numpy and pandas that is installed via pip in a bootstrap script, along with a few other dependencies for communicating with s3:
#!/bin/bash -xe
echo "---------------------------------------------------------"
echo "using python version:"
python3 --version
echo "initial python packages (sudo python3 -m pip list):"
sudo python3 -m pip list
echo "---------------------------------------------------------"
echo "install python3-dev development tools"
sudo yum -y install python3-devel
echo "---------------------------------------------------------"
echo "installing python dependencies"
sudo python3 -m pip install -U pip
echo "pip installed/updated"
sudo python3 -m pip install -U setuptools
echo "setuptools installed"
sudo python3 -m pip install \
cloudpickle==1.6.0 \
boto3==1.21.7 \
fsspec==2022.2.0 \
s3fs==0.4.2
echo "aws dependencies installed (boto3, cloudpickle, fsspec, s3fs)"
# sudo python3 -m pip install \
# pandas==1.1.5 \
# numpy==1.16.5
# echo "pandas + numpy installed"
sudo python3 -m pip install pandas==1.2.5
echo "pandas installed"
echo "final python packages (sudo python3 -m pip list):"
sudo python3 -m pip list
This runs without failure on r5 instances, and numpy is available in the python environment as expected.
When allowing {r6, r6g) instance types, the bootstrap script fails with the following message:
_configtest.c:1:10: fatal error: Python.h: No such file or directory
#include <Python.h>
^~~~~~~~~~
compilation terminated.
failure.
removing: _configtest.c _configtest.o
Traceback (most recent call last):
File "<string>", line 36, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/setup.py", line 419, in <module>
setup_package()
File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/setup.py", line 411, in setup_package
setup(**metadata)
File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/core.py", line 171, in setup
return old_setup(**new_attr)
File "/usr/local/lib/python3.7/site-packages/setuptools/__init__.py", line 155, in setup
return distutils.core.setup(**attrs)
File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 148, in setup
return run_commands(dist)
File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
dist.run_commands()
File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
self.run_command(cmd)
File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/install.py", line 62, in run
r = self.setuptools_run()
File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/install.py", line 36, in setuptools_run
return distutils_install.run(self)
File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/command/install.py", line 670, in run
self.run_command('build')
File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/build.py", line 47, in run
old_build.run(self)
File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/build_src.py", line 148, in run
self.build_sources()
File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/build_src.py", line 165, in build_sources
self.build_extension_sources(ext)
File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/build_src.py", line 322, in build_extension_sources
sources = self.generate_sources(sources, ext)
File "/mnt/tmp/pip-install-np4ypx_v/numpy_833906a79d1d4dfeb4789de3524fd268/numpy/distutils/command/build_src.py", line 375, in generate_sources
source = func(extension, build_dir)
File "numpy/core/setup.py", line 423, in generate_config_h
moredefs, ignored = cocache.check_types(config_cmd, ext, build_dir)
File "numpy/core/setup.py", line 47, in check_types
out = check_types(*a, **kw)
File "numpy/core/setup.py", line 281, in check_types
"install {0}-dev|{0}-devel.".format(python))
SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> numpy
note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
Note, this bootstrap script attempts to address the problem from the error message by installing python-devel via yum before running the pip install.
ahhhhhhhhhh. good thinking, that makes sense. and, that solution is reasonable, but it also means that i would have to fully commit my instance fleet to the AMD architecture if it's coming in via my Spark --py-files (wheels) parameter.