How can I be sure that manually installed libraries persist in Amazon SageMaker if my lifecycle configuration times out when I try to install the libraries?
When I try to install additional libraries, my lifecycle configuration scripts run for more than five minutes. This causes the Amazon SageMaker notebook instance to time out. I want to resolve this issue. Also, I want to be sure that my manually installed libraries persist between notebook instance sessions.
Short description
If a lifecycle configuration script runs for longer than five minutes, the script fails, and the notebook instance isn't created or started.
Use one of the following methods to resolve this issue:
- nohup: The nohup command that's short for 'no hangup' is a Linux command that ignores the hang up signal. Using this command with the ampersand symbol at the end forces the lifecycle configuration script to run in the background until the packages are installed. This method is a best practice for less technical users, and is more appropriate as a short-term solution.
Note: The nohup command ignores the hang up signal. Therefore, you must use it with the ampersand symbol for the script to continue to run in the background. The shell that runs the lifecycle configuration script is terminated at the end of the script. Therefore, add nohup in the beginning of the command and & at the end of the command to force the lifecycle configuration script to run in the background. - Create a custom, persistent Conda installation on the notebook instance's Amazon Elastic Block Store (Amazon EBS) volume: Run the on-create script in the terminal of an existing notebook instance. This script uses Miniconda to create a separate Conda installation on the EBS volume ( /home/ec2-user/SageMaker/). Then, run the on-start script as a lifecycle configuration to make the custom environment available as a kernel in Jupyter. This method is a best practice for more technical users, and it is a better long-term solution.
Resolution
Use one of the following methods to resolve lifecycle configuration timeouts.
Run the nohup command
Use the nohup command to force the lifecycle configuration script to continue running in the background even after the five-minute timeout period expires. Be sure to add the ampersand (&) at the end of the command.
Example:
#!/bin/bash set -e nohup pip install xgboost &
The script stops running after the libraries are installed. You aren't notified when this happens, but you can use the ps command to find out if the script is still running.
Note: You can also use the nohup command if your lifecycle configuration script times out in other scenarios, such as when you download large Amazon Simple Storage Service (Amazon S3) objects.
Create a custom persistent Conda installation on the notebook instance's EBS volume
1. In the terminal of an existing notebook instance, create a .sh file using your preferred editor.
Example:
vim custom-script.sh
2. Copy the contents of the on-create script into the .sh file. This script creates a new Conda environment in a custom Conda installation. This script also installs NumPy and Boto3 in the new Conda environment.
Note: The notebook instance must have internet connectivity to download the Miniconda installer and ipykernel.
3. Mark the script as executable, and then run it.
Example:
chmod +x custom-script.sh ./custom-script.sh
4. When installation is complete, stop the notebook instance.
5. Copy the on-start script into a .sh file.
#!/bin/bash set -e # OVERVIEW # This script installs a custom, persistent installation of conda on the Notebook Instance's EBS volume, and ensures # that these custom environments are available as kernels in Jupyter. # # The on-start script uses the custom conda environment created in the on-create script and uses the ipykernel package # to add that as a kernel in Jupyter. # # For another example, see: # https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-add-external.html#nbi-isolated-environment sudo -u ec2-user -i <<'EOF' unset SUDO_UID WORKING_DIR=/home/ec2-user/SageMaker/custom-miniconda/ source "$WORKING_DIR/miniconda/bin/activate" for env in $WORKING_DIR/miniconda/envs/*; do BASENAME=$(basename "$env") source activate "$BASENAME" python -m ipykernel install --user --name "$BASENAME" --display-name "Custom ($BASENAME)" done # Optionally, uncomment these lines to disable SageMaker-provided Conda functionality. # echo "c.EnvironmentKernelSpecManager.use_conda_directly = False" >> /home/ec2-user/.jupyter/jupyter_notebook_config.py # rm /home/ec2-user/.condarc EOF echo "Restarting the Jupyter server.." # For notebook instance with alinux (notebook-al1-v1) initctl restart jupyter-server --no-wait # Use this instead for notebook instance with alinux2 (notebook-al2-v1) systemctl restart jupyter-server
6. On the stopped notebook instance, add the on-start script as a lifecycle configuration. This script makes the custom environment available as a kernel in Jupyter every time that you start the notebook instance.
7. Start the notebook instance, and then install your custom libraries in the custom environment.
For example, to install pyarrow:
import sys !conda install --yes --prefix {sys.prefix} -c conda-forge pyarrow
If you get an error message that says that you need to update Conda, run the following commands. Then, try installing the custom libraries again.
!conda install -p "/home/ec2-user/anaconda3" "conda>=4.8" --yes !conda install -p "/home/ec2-user/SageMaker/custom-miniconda/miniconda" "conda>=4.8" --yes
If you stop and then start your notebook instance, your custom Conda environment and libraries are still available. You don't have to install them again.
Note: You can use the Amazon CloudWatch logs to troubleshoot issues with lifecycle configuration scripts. You can view the script execution logs in the log stream LifecycleConfigOnStart under the aws/sagemaker/studio namespace.
Related information
Amazon SageMaker notebook instance lifecycle configuration samples
Relevant content
- asked 8 months agolg...
- asked 2 years agolg...
- asked 2 years agolg...
- asked a year agolg...
- asked a year agolg...
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago