How to upgrade python version in EMR (since python 3.7 support discontinued)

1

I am using EMR 6.13.0, it is using python 3.7. in my code i have used boto3, the boto3 support for python 3.7 will be discontinued from December-2023. and as we aware the python 3.7 support stopped as well. Is there any way available in EMR to upgrade the python version, i have tried using 'sudo pip3 install python3.10' in EMR Bootstrap, but i could not see any change. Kindly advise me the way to upgrade the python version from 3.7 to 3.10 or above. is there any reason still EMR using python 3.7?

asked a year ago5758 views
2 Answers
1
Accepted Answer

Hello There,

Thank you for the query.

Please see my response below:

I would like to inform you that EMR service team is already working on upgrading the python version in EMR. Unfortunately, I cannot provide a specific date for its implementation as of now, as we don't have any visibility into the release team's product pipeline and enhancements.

In the meantime, if you would like to upgrade your Cluster to a higher version of Python, I can suggest the following two workarounds.

  1. You can follow these instructions [1], which provides a step by step instruction on using Custom Python 3 version on EMR.
  2. Alternately, if you would like to upgrade EMR to Python 3.9 for instance, here is a sample bootstrap script. You can verify if this has been successful, by SSH into Primary Node after EMR cluster is ready, by running commands "python --version" and "pyspark".

===============================================

#!/bin/bash
sudo yum install libffi-devel -y
sudo wget https://www.python.org/ftp/python/3.9.0/Python-3.9.0.tgz   
sudo tar -zxvf Python-3.9.0.tgz
cd Python-3.9.0
sudo ./configure --enable-optimizations
sudo make altinstall
python3.9 -m pip install --upgrade awscli --user
sudo ln -sf /usr/local/bin/python3.9 /usr/bin/python3

===============================================

As I understand you have already tried bootstrap solution, I would suggest you to try this bootstrap script, as it would be easier and quicker solution.

Please let me know if the above solutions work for you or if you have any further questions on this.

References:

[1] > https://github.com/aws-samples/aws-emr-utilities/blob/main/utilities/emr-ec2-custom-python3/README.md#reducing-cluster-start-time

profile pictureAWS
SUPPORT ENGINEER
Rajiv_M
answered a year ago
profile picture
EXPERT
reviewed 3 months ago
  • Thanks @Rajiv_M for your response, I have tried the method 2 (bootstrap), it works fine.

    I would like to know is it possible to share us the timeline for EMR release with updated python version. (whether it can be done before december-2023) -> our only reason for the upgrade is due to this message from boto3 - 'Boto3 will no longer support Python 3.7 starting December 13, 2023' and we do not have any other technical requirement for the python upgrade in EMR.

  • Hello There,

    Thank you for getting back to me. I am glad to here that one of the solutions worked for you.

    Regarding the timeline, unfortunately I won't be able to provide you with a fixed time as I don't have visibility into the Development teams release timeline. However I understand that Python version is upgraded to Python 3.9 as part of our Amazon Linux 2023 releases. AL2023 was just released in EMR on EKS [1]. Therefore, you can expect the support for higher version of Python on EMR on EC2 soon. I would request you to keep an eye on EMRs Release page regarding the announcement on this.

    References:

    [1] https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-6.13.0.html [2] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-components.html

0

In most Amazon EMR release versions, cluster instances and system applications use different Python versions by default. Option-1: To upgrade the Python version that PySpark uses, point the PYSPARK_PYTHON environment variable for the spark-env classification to the directory where required Python version is installed. (https://repost.aws/knowledge-center/emr-pyspark-python-3x) Option-2: If you want another Python, you may need to pick another AMI, a different container image

AWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions