Durch die Nutzung von AWS re:Post stimmt du den AWS re:Post Nutzungsbedingungen

Installing additional libraries to PySpark kernel

0

JupyterHub on Amazon EMR comes with default PySpark kernel. How can I install additional libraries on this kernel (e.g. numpy)? I've tried following instructions on https://aws.amazon.com/blogs/big-data/install-python-libraries-on-a-running-cluster-with-emr-notebooks/. However, I cannot install simple libraries like pandas:

sc.install_pypi_package("pandas==0.25.1")
An error was encountered:No module named 'six'Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 1108, in install_pypi_package pypi_package = self._validate_package(pypi_package) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 1173, in _validate_package import sixModuleNotFoundError: No module named 'six'

And I get the same error if I try to install six sc.install_pypi_package("pandas==0.25.1")

gefragt vor 2 Jahren2166 Aufrufe
2 Antworten
1

One possible cause of this issue is that the PySpark kernel does not have access to the required Python libraries. In order to install additional libraries on the PySpark kernel, you need to ensure that the libraries are available on the EMR cluster. Here are some steps you can take to install additional libraries on the PySpark kernel: Install the libraries on the EMR cluster using the pip command. For example:

!pip install pandas==0.25.1 Make sure that the libraries are available on all the worker nodes in the EMR cluster. You can do this by adding the libraries to the PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON paths in the spark-env.sh file on the worker nodes.

Restart the PySpark kernel in JupyterHub to make the libraries available to the PySpark kernel.

beantwortet vor 2 Jahren
0

For a complete guide on how to install additional kernels and libraries on EMR Jupyert hub please read the documentation page here

AWS
EXPERTE
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen