How to install additional local python libraries in AWS EMR notebooks


I am using both pyspark and local python kernel (%%local) in a single EMR notebook. I am able to install packages successfully in pyspark kernel using EMR bootstrap but I am unable to install additional local python libraries (s3fs and other packages) using EMR bootstrap action. Could you please provide your guidance on this

질문됨 2년 전3527회 조회
2개 답변
수락된 답변

Are you receiving any specific error during installation? Please see below documentation related to installing python libraries in EMR notebooks:

답변함 2년 전
  • Had gone through the above links. I am able to install local python packages using %%local pip install <packagename> in jupyter notebook pyspark kernel. But i had to do this action everytime for each notebook session. Whether additional local python packages can be directly installed using bootstrap action in pyspark kernels ?


Unfortunately, the contents posted by @Taka_M are old.

I have same question, and posted the answer at

It also helps automate creating JupterLab kerner during EMR provision.

Here is a copy:

I found out JupterLab Python is separate with the EMR cluster custom Python version.

I need first create a new conda Python 3.11 environment for JupterLab, and then register it as a new kernel.

As the JupterLab got installed after the bootstrap script, so I need add a EMR step with script:

#!/usr/bin/env bash
set -e

echo "# Install JupyterLab-scoped dependencies"
sudo /emr/notebook-env/bin/conda create --name="python${PYTHON_VERSION}" python=${PYTHON_VERSION} --yes
sudo "/emr/notebook-env/envs/python${PYTHON_VERSION}/bin/python" -m pip install \
  apache-sedona[spark]==1.5.0 \
  attrs==23.1.0 \
  descartes==1.1.0 \
  ipykernel==6.28.0 \
  matplotlib==3.8.2 \
  pandas==2.1.4 \

echo "# Add JupyterLab kernel"
sudo "/emr/notebook-env/envs/python${PYTHON_VERSION}/bin/python" -m ipykernel install --name="python${PYTHON_VERSION}"

Now the new Python 3.11 kernel shows in the JupterLab:

Enter image description here

And it prints correct Python version:

import sys

# sys.version_info(major=3, minor=11, micro=7, releaselevel='final', serial=0)


profile picture
답변함 4달 전
지원 엔지니어
검토됨 2일 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠