Not able to import lxml in AWS Glue Job

0

Hi Team, I have a complex nested xml file which I want to read using AWS Glue and convert it to parquet format. I want to use pandas read_xml function to read the xml file. But, I get error lxml not found. So, I tried to add these 2 keys to AWS Glue Job parameters - --python-modules-installer-option (value of --upgrade) and --additional-python-modules (value of lxml==4.9.2). In the code, I tried to import etree from lxml as - from lxml import etree

But I'm still get error that lxml module not found. Please guide. My aim is to use pandas read_xml function to read an xml file in S3 using AWS Glue job. Please help

Mayura
asked 9 months ago272 views
1 Answer
0
Accepted Answer

Cannot reproduce that, is it possible you are running on a VPC without internet access?
Check the error log, you should see a command like this and hopefully the reason it's not installed (maybe some conflict):

pip3 install --upgrade --user lxml==4.9.2
Collecting lxml==4.9.2  Downloading lxml-4.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (7.1 MB)     
Checking pymodule installation result for List(lxml==4.9.2): 
profile pictureAWS
EXPERT
answered 9 months ago
  • Yes, you are right. Working on a client machine in a private subnet. No internet access. Thank you Sir!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions