Not able to import lxml in AWS Glue Job

0

Hi Team, I have a complex nested xml file which I want to read using AWS Glue and convert it to parquet format. I want to use pandas read_xml function to read the xml file. But, I get error lxml not found. So, I tried to add these 2 keys to AWS Glue Job parameters - --python-modules-installer-option (value of --upgrade) and --additional-python-modules (value of lxml==4.9.2). In the code, I tried to import etree from lxml as - from lxml import etree

But I'm still get error that lxml module not found. Please guide. My aim is to use pandas read_xml function to read an xml file in S3 using AWS Glue job. Please help

Mayura
preguntada hace 10 meses293 visualizaciones
1 Respuesta
0
Respuesta aceptada

Cannot reproduce that, is it possible you are running on a VPC without internet access?
Check the error log, you should see a command like this and hopefully the reason it's not installed (maybe some conflict):

pip3 install --upgrade --user lxml==4.9.2
Collecting lxml==4.9.2  Downloading lxml-4.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (7.1 MB)     
Checking pymodule installation result for List(lxml==4.9.2): 
profile pictureAWS
EXPERTO
respondido hace 10 meses
  • Yes, you are right. Working on a client machine in a private subnet. No internet access. Thank you Sir!

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas