How do I resolve "ImportError: No module named" in AWS Glue?
When I try to import extra modules or packages using the AWS Glue Python shell, I get an "ImportError: No module named" response. For example: ImportError: No module named pyarrow.compat
Short description
The AWS Glue Python shell uses .egg and .whl files. Python can import directly from a .egg or .whl file. To maintain compatibility, be sure that your local build environment uses the same Python version as the Python shell job. For example, if you build a .egg file with Python 3, use Python 3 for the AWS Glue Python shell job.
Note: Starting June 1, 2022, Python shell jobs support only Python 3. For more information, see AWS Glue version support policy.
Resolution
1. Create the setup.py file and add the install_requires parameter to list the modules that you want to import:
from setuptools import setup setup( name="redshift_module", version="0.1", packages=['redshift_module'], install_requires=['pyarrow','pandas','numpy','fastparquet'] )
2. Create a folder named reshift_module under the current directory:
$ mkdir redshift_module
Then, install the packages:
$ python setup.py develop
Example output:
running develop running egg_info writing requirements to redshift_module.egg-info/requires.txt writing redshift_module.egg-info/PKG-INFO writing top-level names to redshift_module.egg-info/top_level.txt writing dependency_links to redshift_module.egg-info/dependency_links.txt reading manifest file 'redshift_module.egg-info/SOURCES.txt' writing manifest file 'redshift_module.egg-info/SOURCES.txt' running build_ext Creating /usr/local/lib/python3.6/site-packages/redshift-module.egg-link (link to .) redshift-module 0.1 is already the active version in easy-install.pth Using /Users/test/Library/Python/3.6/lib/python/site-packages Searching for pandas==0.24.2 Best match: pandas 0.24.2 Adding pandas 0.24.2 to easy-install.pth file Using /usr/local/lib/python3.6/site-packages Searching for pyarrow==0.12.1 Best match: pyarrow 0.12.1 Adding pyarrow 0.12.1 to easy-install.pth file Installing plasma_store script to /usr/local/bin
3. Do one of the following:
Create a .egg file:
python setup.py bdist_egg
-or- Create a .whl file:
python setup.py bdist_wheel
5. Copy the .egg or .whl file from the dist folder to an Amazon Simple Storage Service (Amazon S3) bucket. For more information, see Providing your own Python library. Example:
dist aws s3 cp MOA_EDM_cdc_controller_g2-0.2.9-py3-none-any.whl s3://doc-example-bucket/glue-libs/python-shell-jobs/ upload: ./MOA_EDM_cdc_controller_g2-0.2.9-py3-none-any.whl to s3://doc-example-bucket/glue-libs/python-shell-jobs/MOA_EDM_cdc_controller_g2-0.2.9-py3-none-any.whl
Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you’re using the most recent AWS CLI version.
6. The module is now installed in your Python shell job. To confirm, check the Amazon CloudWatch Logs group for Python shell jobs (/aws-glue/python-jobs/output). Here's an example of a successful output:
Searching for pyarrow Reading https://pypi.python.org/simple/pyarrow/ Downloading https://files.pythonhosted.org/packages/fe/3b/267c0fdb3dc5ad7989417cfb447fbcbec008bafc1bb26d4f0221c5e4e508/pyarrow-0.12.1-cp27-cp27mu-manylinux1_x86_64.whl#sha256=63170571cccaf0bf01a1d30eacc4d9274bd5c4f448c2b5b1a4ddc125952f4284 Best match: pyarrow 0.12.1 Processing pyarrow-0.12.1-cp27-cp27mu-manylinux1_x86_64.whl Installing pyarrow-0.12.1-cp27-cp27mu-manylinux1_x86_64.whl to /glue/lib/installation writing requirements to /glue/lib/installation/pyarrow-0.12.1-py3.6-linux-x86_64.egg/EGG-INFO/requires.txt Adding pyarrow 0.12.1 to easy-install.pth file Installing plasma_store script to /glue/lib/installation Installed /glue/lib/installation/pyarrow-0.12.1-py3.6-linux-x86_64.egg
Related information
How do I use external Python libraries in my AWS Glue 1.0 or 0.9 ETL job?
How do I use external Python libraries in my AWS Glue 2.0 ETL job?
Contenido relevante
- OFICIAL DE AWSActualizada hace 3 meses
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace 5 meses
- OFICIAL DE AWSActualizada hace 3 meses