AWS Glue Jupyter Notebook additional modules

0

Hi guys,

I'm trying to create a Glue job with Jupyter Notebook but I can't seem to import external modules. I installed external module following the documentation here https://docs.aws.amazon.com/glue/latest/ug/notebook-getting-started.html

%additional_python_modules s3://notebook-mod/simple_salesforce.whl

When I run my import statement cell block I get the following error: Enter image description here

Am I missing anything? Thanks!

asked 2 years ago4846 views
4 Answers
2

Hello,

Can you please try to provide module names directly instead of proving the whl file. Please use below line once and let me know.

%additional_python_modules simple-salesforce,pandera

AWS
answered 2 years ago
  • This worked for me.

0
Accepted Answer

Hello,

You have used the correct approach to install external python modules in Glue studio Notebook which uses Glue2.0/Glue 3.0

To investigate, I have setup in my environment and used below steps:

  1. Create Glue studio Notebook (Navigate to Glue Console --> In left side panel click on Glue studio --> Select Jupyter Notebook)
  2. Downlaod the simple-salesforce.whl file from pypi (https://files.pythonhosted.org/packages/60/3c/647da942ce0e1f024dc3e188ebc60ee28972ba1254e691e3512511b9062a/simple_salesforce-1.12.1-py2.py3-none-any.whl) and upload it to s3
  3. Use below code to install simple_salesforce
%additional_python_modules s3://library/simple_salesforce-1.12.1-py2.py3-none-any.whl)

from simple_salesforce import Salesforce

It executed successfully without any issue. In your case i am suspecting you are using Sagemaker Notebook backed by Glue Devendpoint which uses Glue 1.0 and does not support additional_python_modules. Can you please check and confirm once again if you are using correct notebook or not.

Reference:

[1] https://docs.aws.amazon.com/glue/latest/ug/notebook-getting-started.html

AWS
answered 2 years ago
  • I'm using interactive Notebook in Glue studio. It works fine following your instruction with one module.

    Is it the same process if you do more than one additional module? If I do the below, it tells me 'ModuleNotFoundError: No module named 'simple_salesforce''

    %additional_python_modules s3://modules/simple_salesforce-1.12.1-py2.py3-none-any.whl, pandera-0.11.0-py3-none-any.whl

    from simple_salesforce import Salesforce import pandera

0

Hello,

You can provide multiple python modules using %additional_python_modules in notebook. In above example you have not provided the absolute whl file path of pandera module. Please provide the absolute path for each modue separated by comma.

%additional_python_modules s3://library/simple_salesforce-1.12.1-py2.py3-none-any.whl, s3://library/pandera-0.11.0-py3-none-any.whl
AWS
answered 2 years ago
0

So it works when I just did one module. I can import additional without issue. I tried the module in separate code and it works. Enter image description here

When I do more than one modules following the documentation. It doesn't work. Enter image description here

Let me know what else am I missing. Thanks for your help so far.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions