By using AWS re:Post, you agree to the Terms of Use

How to use external libraries in AWS Glue Python Shell

1

I am trying to use external libraries like openpyxl, added the wheel for it in S3 and referenced in the Job details, but it seems that it is not working. Tried adding a parameter with the version needed too. But nothing is working. Can you please suggest any other way of doing the same or any other service through which I can run my python jobs(Contains code for fetching data from different dbs, transformation and creating reports with aggregated values)

2 Answers
1

Hi,

to successfully add an external library to a Glue Python Shell job you should follow the documentation at this link.

considering you have already downloaded the wheel file and uploaded it to Amazon S3, then if you are creating your job via command line you need to add the parameter:

--default-arguments '{"--extra-py-files" : ["s3://MyBucket/python/library/openpyxl-3.0.9-py2.py3-none-any.whl"]}

if you are creating/editing the Python shell in the console:

look under the Security configuration, script libraries, and job parameters (optional) section

Once you locate the text box under Python library path paste the full S3 URI for your wheel file.

I tested it with your library and it works in my environment.

Processing ./glue-python-libs-cr2dddvq/openpyxl-3.0.9-py2.py3-none-any.whl
Collecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.9

hope this helps,

EXPERT
answered 6 months ago
  • I did the same thing but my problem was not solved. I have edited the Python shell in the console. In new version user interface of Jobs, not legacy version jobs, I found the Python library path in the library section in the Job details tab.

0

Thank you for your question. Without some context its hard to say what is the reason, but in general i was able to make it work as based on this article https://aws.amazon.com/premiumsupport/knowledge-center/glue-version2-external-python-libraries/

answered 8 months ago
  • The article shared does not work either, right now I am importing just one library - openpyxl. Gives "No module named openpyxl" error. Have passed the wheel file downloaded from internet and added, also tried passing the key value job parameter (key:--additional-python-modules, value: openpyxl==3.0.9)

  • Hi, the answer is actually incorrect, the link provided works for AWS Glue Spark JObs , not for Glue Pyhon Shell as requested in the question.

    it also could be improved by mentioning that it is possible to understand if an error is happening during the import of the external library by checking in the Cloudwatch logs for the job.

    If no error are presents the logs under the job run will show the package installed.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions