How to add additional library i.e. databricks spark xml to a running EMR cluster and access it in Notebook

0

How to add additional library i.e. databricks spark xml to a running EMR cluster and access it in Notebook

Rajeev
질문됨 5달 전205회 조회
1개 답변
0

Hi,

You can try the below steps to install databricks spark xml on EMR cluster.

On EMR Master node:

cd /usr/lib/spark/jars
sudo wget https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.9.0/spark-xml_2.11-0.9.0.jar

Make sure to select the correct jar according to your Spark version and the guidelines provided in https://github.com/databricks/spark-xml.

Then, launch your Jupyter notebook and you should be able to run the following:

df = spark.read.format('com.databricks.spark.xml').options(rootTag='objects').options(rowTag='object').load("s3://bucket-name/sample.xml")

Please feel free to reach to AWS support for any further assistance needed on the same.

Thank you!

AWS
지원 엔지니어
답변함 5달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠