How to use external libraries in AWS Glue Python Shell

1

I am trying to use external libraries like openpyxl, added the wheel for it in S3 and referenced in the Job details, but it seems that it is not working. Tried adding a parameter with the version needed too. But nothing is working. Can you please suggest any other way of doing the same or any other service through which I can run my python jobs(Contains code for fetching data from different dbs, transformation and creating reports with aggregated values)

質問済み 2年前5698ビュー
2回答
1

Hi,

to successfully add an external library to a Glue Python Shell job you should follow the documentation at this link.

UPDATE as described i the link above, when using python 3.9 the best option to install external libraries is:

--additional-python-modules s3://aws-glue-native-spark/tests/j4.2/fbprophet-0.6-py3-none-any.whl,scikit-learn==0.21.3

For previous version the following is still correct considering you have already downloaded the wheel file and uploaded it to Amazon S3, then if you are creating your job via command line you need to add the parameter:

--default-arguments '{"--extra-py-files" : ["s3://MyBucket/python/library/openpyxl-3.0.9-py2.py3-none-any.whl"]}

if you are creating/editing the Python shell in the console:

  • for the new Glue Studio Job Editor : look under Job Details , Advanced properties.

  • for the legacy Job Editor - look under the Security configuration, script libraries, and job parameters (optional) section

Once you locate the text box under Python library path paste the full S3 URI for your wheel file.

I tested it with your library and it works in my environment.

Processing ./glue-python-libs-cr2dddvq/openpyxl-3.0.9-py2.py3-none-any.whl
Collecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.9

hope this helps,

AWS
エキスパート
回答済み 2年前
  • I did the same thing but my problem was not solved. I have edited the Python shell in the console. In new version user interface of Jobs, not legacy version jobs, I found the Python library path in the library section in the Job details tab.

0

Thank you for your question. Without some context its hard to say what is the reason, but in general i was able to make it work as based on this article https://aws.amazon.com/premiumsupport/knowledge-center/glue-version2-external-python-libraries/

AWS
Alex_T
回答済み 2年前
  • The article shared does not work either, right now I am importing just one library - openpyxl. Gives "No module named openpyxl" error. Have passed the wheel file downloaded from internet and added, also tried passing the key value job parameter (key:--additional-python-modules, value: openpyxl==3.0.9)

  • Hi, the answer is actually incorrect, the link provided works for AWS Glue Spark JObs , not for Glue Pyhon Shell as requested in the question.

    it also could be improved by mentioning that it is possible to understand if an error is happening during the import of the external library by checking in the Cloudwatch logs for the job.

    If no error are presents the logs under the job run will show the package installed.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ