Not able to import lxml in AWS Glue Job

0

Hi Team, I have a complex nested xml file which I want to read using AWS Glue and convert it to parquet format. I want to use pandas read_xml function to read the xml file. But, I get error lxml not found. So, I tried to add these 2 keys to AWS Glue Job parameters - --python-modules-installer-option (value of --upgrade) and --additional-python-modules (value of lxml==4.9.2). In the code, I tried to import etree from lxml as - from lxml import etree

But I'm still get error that lxml module not found. Please guide. My aim is to use pandas read_xml function to read an xml file in S3 using AWS Glue job. Please help

Mayura
질문됨 10달 전293회 조회
1개 답변
0
수락된 답변

Cannot reproduce that, is it possible you are running on a VPC without internet access?
Check the error log, you should see a command like this and hopefully the reason it's not installed (maybe some conflict):

pip3 install --upgrade --user lxml==4.9.2
Collecting lxml==4.9.2  Downloading lxml-4.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (7.1 MB)     
Checking pymodule installation result for List(lxml==4.9.2): 
profile pictureAWS
전문가
답변함 10달 전
  • Yes, you are right. Working on a client machine in a private subnet. No internet access. Thank you Sir!

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠