How to install additional local python libraries in AWS EMR notebooks

1

I am using both pyspark and local python kernel (%%local) in a single EMR notebook. I am able to install packages successfully in pyspark kernel using EMR bootstrap but I am unable to install additional local python libraries (s3fs and other packages) using EMR bootstrap action. Could you please provide your guidance on this

질문됨 2년 전3527회 조회
2개 답변
2
수락된 답변

Are you receiving any specific error during installation? Please see below documentation related to installing python libraries in EMR notebooks:

AWS
Taka_M
답변함 2년 전
  • Had gone through the above links. I am able to install local python packages using %%local pip install <packagename> in jupyter notebook pyspark kernel. But i had to do this action everytime for each notebook session. Whether additional local python packages can be directly installed using bootstrap action in pyspark kernels ?

0

Unfortunately, the contents posted by @Taka_M are old.

I have same question, and posted the answer at https://stackoverflow.com/a/77750780/2000548

It also helps automate creating JupterLab kerner during EMR provision.

Here is a copy:


I found out JupterLab Python is separate with the EMR cluster custom Python version.

I need first create a new conda Python 3.11 environment for JupterLab, and then register it as a new kernel.

As the JupterLab got installed after the bootstrap script, so I need add a EMR step with script:

#!/usr/bin/env bash
set -e

echo "# Install JupyterLab-scoped dependencies"
PYTHON_VERSION=3.11.7
sudo /emr/notebook-env/bin/conda create --name="python${PYTHON_VERSION}" python=${PYTHON_VERSION} --yes
sudo "/emr/notebook-env/envs/python${PYTHON_VERSION}/bin/python" -m pip install \
  apache-sedona[spark]==1.5.0 \
  attrs==23.1.0 \
  descartes==1.1.0 \
  ipykernel==6.28.0 \
  matplotlib==3.8.2 \
  pandas==2.1.4 \
  shapely==2.0.2

echo "# Add JupyterLab kernel"
sudo "/emr/notebook-env/envs/python${PYTHON_VERSION}/bin/python" -m ipykernel install --name="python${PYTHON_VERSION}"

Now the new Python 3.11 kernel shows in the JupterLab:

Enter image description here

And it prints correct Python version:

import sys

print(sys.version_info)
# sys.version_info(major=3, minor=11, micro=7, releaselevel='final', serial=0)

Reference:

profile picture
답변함 4달 전
AWS
지원 엔지니어
검토됨 2일 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠