1回答
- 新しい順
- 投票が多い順
- コメントが多い順
1
It looks like your EMR Spark job is not able to find the packages installed in your virtual environment. To ensure that Spark is using the Python environment in your virtual environment, you can try the following:
- Add the following line to your EMR Spark job configuration to ensure that Spark uses the Python binary from your virtual environment:
"spark.executorEnv.PYTHONHASHSEED":"0"
- In your PySpark code, add the following lines to explicitly set the Python environment to use:
import os
os.environ['PYSPARK_PYTHON'] = './environment/bin/python'
os.environ['PYSPARK_DRIVER_PYTHON'] = './environment/bin/python'
- Make sure that the pyspark_venv.tar.gz file is uploaded to your S3 bucket with read permissions.
- Verify that the virtual environment is successfully extracted by checking the logs in the yarn/userlogs directory.
関連するコンテンツ
- AWS公式更新しました 2年前
- AWS公式更新しました 1年前
- AWS公式更新しました 3年前
Thanks for the answer, after further test in the end was the version of python not compatible