Not able to import lxml in AWS Glue Job

0

Hi Team, I have a complex nested xml file which I want to read using AWS Glue and convert it to parquet format. I want to use pandas read_xml function to read the xml file. But, I get error lxml not found. So, I tried to add these 2 keys to AWS Glue Job parameters - --python-modules-installer-option (value of --upgrade) and --additional-python-modules (value of lxml==4.9.2). In the code, I tried to import etree from lxml as - from lxml import etree

But I'm still get error that lxml module not found. Please guide. My aim is to use pandas read_xml function to read an xml file in S3 using AWS Glue job. Please help

Mayura
已提問 10 個月前檢視次數 293 次
1 個回答
0
已接受的答案

Cannot reproduce that, is it possible you are running on a VPC without internet access?
Check the error log, you should see a command like this and hopefully the reason it's not installed (maybe some conflict):

pip3 install --upgrade --user lxml==4.9.2
Collecting lxml==4.9.2  Downloading lxml-4.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (7.1 MB)     
Checking pymodule installation result for List(lxml==4.9.2): 
profile pictureAWS
專家
已回答 10 個月前
  • Yes, you are right. Working on a client machine in a private subnet. No internet access. Thank you Sir!

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南