2 Answers
- Newest
- Most votes
- Most comments
3
Hello,
You can follow the below steps in zeppelin to install the packages at runtime. This method works in client deploy mode.
- Provide access to home directory for other user(zeppelin) using following command using a Bootstrap script so that zeppelin service can install the packages on /home/.local directory.
sudo chmod 757 /home
- Add below settings in spark interpreter from Zeppelin UI and restart the interpreter.
spark.pyspark.virtualenv.enabled true
spark.pyspark.virtualenv.bin.path /usr/bin/virtualenv
spark.pyspark.virtualenv.type native
spark.pyspark.python python3
- Now try installing the packages using below command from notebook,
%spark.pyspark
sc.install_pypi_package("xgboost")
0
Hi,
Have a look at https://medium.com/@techboomph/getting-zeppelin-to-work-with-emr-93e237ac446a
The author proposes a solution to do the pip install for a Zepplin notebook on EMR that you need.
Didier
Relevant content
- asked a year ago
- AWS OFFICIALUpdated 4 days ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a month ago
Thanks for sharing , David - so looks like he is suggesting to include the pip install as part of the bootstrap script which means the cluster would need to be recreated. I could try that , however I believe that something similar to Jupyter - where you could do a pip install in the note book itself - should be available in Zeppelin. I see that the %conda interpreter is loaded , but I am unable to make that work - like if I type %conda install happybase ... it just says command ( install ) not found