Issue importing custom Python modules in Jupyter notebooks with SparkMagic and AWS Glue endpoint

0

I'm encountering an issue while attempting to run ETL scripts in Jupyter notebooks using SparkMagic, which is connected to an AWS Glue endpoint via SSH. I followed the tutorial provided in the AWS Glue documentation (link: https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-tutorial-local-jupyter.html) and successfully set up the connection. I can run Pyspark without any problems.

However, I'm facing difficulties when trying to import custom Python modules that I created. I have ensured that I uploaded my files to the Glue endpoint and placed them in a file directory that I appended to the Jupyter notebook's search path. When attempting to import the module using the following code:

from test.base.text import DataFields

It fails to import because the Python interpreter is set to use /usr/bin/python3 by default. Instead, I need to use the **/usr/bin/gluepython3 **interpreter.

sys.executable

returns

/usr/bin/python3

I have tried several steps to make it use the correct Python interpreter, including:

  1. Configuring sparkmagic to use gluepython3: %%configure -f { "conf": { "spark.pyspark.python": "/usr/bin/gluepython3" } }

  2. Setting the PYSPARK_PYTHON environment variable in the notebook: alias python="/usr/bin/gluepython3"

  3. Modifying the .bashrc file on the Glue endpoint and creating an alias for Python to point to gluepython3: alias python="/usr/bin/gluepython3"

Despite trying these approaches, I have only been able to successfully import the module when running the code outside of Jupyter notebook using an SSH shell and manually invoking the Python file on the endpoint:

/usr/bin/gluepython3 /location/to/file.py

Any suggestions or guidance on how to resolve this issue and make the custom module import work within Jupyter notebooks using SparkMagic and the AWS Glue endpoint would be greatly appreciated.

Thank you in advance!

2 Answers
1

To Add to the question above , I would recommend that you move to Glue Interactive Sessions.

Dev end-points are no more developed and do support only Glue version 1.

Switching to Interactive sessions, you can dynamically choose the Glue version (2,3 or 4) and hence spark version. Furthermore it highly simplify the way you can add additional python modules by using a magics

AWS
EXPERT
answered a year ago
0

According to my understanding, you are including the custom module by uploading to the local directory structure of your Glue Dev endpoint using SSH and trying to link them. However, according to the documentation, the way you import custom python modules that you use for your development endpoint is by adding them as a dependency S3 path at the time of creation of the development endpoint.

Python library path

    Comma-separated Amazon Simple Storage Service (Amazon S3) paths to Python libraries that are required by your script. Multiple values must be complete paths separated by a comma (,). Only individual files are supported, not a directory path.

In order to use your custom libraries, generate a .whl file of your custom module and upload to an S3 path and use it as a parameter with "Python library path" in dev endpoint.

For more information on how you can create a .whl file refer Python Official documentation or use the following pip command in the directory of the python package with the latest pip and wheel:

pip wheel .

If this still doesn't resolve the issue, I would recommend trying out different options like Glue Interactive Sessions/Notebooks

Unlike AWS Glue development endpoints, AWS Glue interactive sessions are serverless with no infrastructure to manage. You can start interactive sessions very quickly. Interactive sessions have a 1-minute billing minimum with cost-control features. This reduces the cost of developing data preparation applications.

AWS
answered a year ago
AWS
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions