- Newest
- Most votes
- Most comments
The python environment that EMR notebook uses is "/emr/notebook-env/bin/python" which is different from the default "/usr/bin/python". This is the reason why we observe the differences. You may also notice the difference between the pip list and !pip list if we run from EMR notebook and the explanation is same.
So as a next step:
-
you can install the python dependency from the EMR notebook manually if needed.
-
In case you wish to automate the installation with the EMR that you needed to use with EMR notebook, you can consider to use the below script as a Bootstrap action[1], so that they get installed in both python environments:
#!/bin/bash
sudo pip3 install <dependency>
sudo /emr/notebook-env/bin/pip install <dependency>
But the catch here is you need to use the delayed bootstrap action script so that once the EMR cluster comes into WAITING state, then after that the bootstrap action runs, see here - https://repost.aws/knowledge-center/emr-update-all-nodes-bootstrap . Delayed bootstrap action is needed because by default when the bootstrap will run, the cluster won't find /emr/notebook-env path and so Bootstrap will fail which will terminate the cluster.
You might already be aware that by default, the Bootstrap action runs before the application provisioning phase of the EMR cluster.
References: [1]: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html
Relevant content
- asked 2 years ago
- asked 10 months ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 4 months ago