How to add additional library i.e. databricks spark xml to a running EMR cluster and access it in Notebook

0

How to add additional library i.e. databricks spark xml to a running EMR cluster and access it in Notebook

Rajeev
gefragt vor 5 Monaten205 Aufrufe
1 Antwort
0

Hi,

You can try the below steps to install databricks spark xml on EMR cluster.

On EMR Master node:

cd /usr/lib/spark/jars
sudo wget https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.9.0/spark-xml_2.11-0.9.0.jar

Make sure to select the correct jar according to your Spark version and the guidelines provided in https://github.com/databricks/spark-xml.

Then, launch your Jupyter notebook and you should be able to run the following:

df = spark.read.format('com.databricks.spark.xml').options(rootTag='objects').options(rowTag='object').load("s3://bucket-name/sample.xml")

Please feel free to reach to AWS support for any further assistance needed on the same.

Thank you!

AWS
SUPPORT-TECHNIKER
beantwortet vor 5 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen