1 Answer
- Newest
- Most votes
- Most comments
0
Hi,
You can try the below steps to install databricks spark xml on EMR cluster.
On EMR Master node:
cd /usr/lib/spark/jars
sudo wget https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.9.0/spark-xml_2.11-0.9.0.jar
Make sure to select the correct jar according to your Spark version and the guidelines provided in https://github.com/databricks/spark-xml.
Then, launch your Jupyter notebook and you should be able to run the following:
df = spark.read.format('com.databricks.spark.xml').options(rootTag='objects').options(rowTag='object').load("s3://bucket-name/sample.xml")
Please feel free to reach to AWS support for any further assistance needed on the same.
Thank you!
Relevant content
- asked 2 years ago
- asked 3 years ago
- AWS OFFICIALUpdated 2 years ago
- How do I install and troubleshoot Python libraries in Amazon EMR and Amazon EMR Serverless clusters?AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 2 months ago