How can I be sure that manually installed libraries persist in Amazon SageMaker if my lifecycle configuration times out when I try to install the libraries?

5 minute read
0

When I try to install additional libraries, my lifecycle configuration scripts run for more than five minutes. This causes the Amazon SageMaker notebook instance to time out. I want to resolve this issue. Also, I want to be sure that my manually installed libraries persist between notebook instance sessions.

Short description

If a lifecycle configuration script runs for longer than five minutes, the script fails, and the notebook instance isn't created or started.

Use one of the following methods to resolve this issue:

  • nohup: The nohup command that's short for 'no hangup' is a Linux command that ignores the hang up signal. Using this command with the ampersand symbol at the end forces the lifecycle configuration script to run in the background until the packages are installed. This method is a best practice for less technical users, and is more appropriate as a short-term solution.
    Note: The nohup command ignores the hang up signal. Therefore, you must use it with the ampersand symbol for the script to continue to run in the background. The shell that runs the lifecycle configuration script is terminated at the end of the script. Therefore, add nohup in the beginning of the command and & at the end of the command to force the lifecycle configuration script to run in the background.
  • Create a custom, persistent Conda installation on the notebook instance's Amazon Elastic Block Store (Amazon EBS) volume: Run the on-create script in the terminal of an existing notebook instance. This script uses Miniconda to create a separate Conda installation on the EBS volume ( /home/ec2-user/SageMaker/). Then, run the on-start script as a lifecycle configuration to make the custom environment available as a kernel in Jupyter. This method is a best practice for more technical users, and it is a better long-term solution.

Resolution

Use one of the following methods to resolve lifecycle configuration timeouts.

Run the nohup command

Use the nohup command to force the lifecycle configuration script to continue running in the background even after the five-minute timeout period expires. Be sure to add the ampersand (&) at the end of the command.

Example:

#!/bin/bash
set -e
nohup pip install xgboost &

The script stops running after the libraries are installed. You aren't notified when this happens, but you can use the ps command to find out if the script is still running.

Note: You can also use the nohup command if your lifecycle configuration script times out in other scenarios, such as when you download large Amazon Simple Storage Service (Amazon S3) objects.

Create a custom persistent Conda installation on the notebook instance's EBS volume

1.    In the terminal of an existing notebook instance, create a .sh file using your preferred editor.

Example:

vim custom-script.sh

2.    Copy the contents of the on-create script into the .sh file. This script creates a new Conda environment in a custom Conda installation. This script also installs NumPy and Boto3 in the new Conda environment.

Note: The notebook instance must have internet connectivity to download the Miniconda installer and ipykernel.

3.    Mark the script as executable, and then run it.

Example:

chmod +x custom-script.sh
./custom-script.sh

4.    When installation is complete, stop the notebook instance.

5.    Copy the on-start script into a .sh file.

#!/bin/bash
set -e
# OVERVIEW
# This script installs a custom, persistent installation of conda on the Notebook Instance's EBS volume, and ensures
# that these custom environments are available as kernels in Jupyter.
# 
# The on-start script uses the custom conda environment created in the on-create script and uses the ipykernel package

# to add that as a kernel in Jupyter.

#

# For another example, see:
# https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-add-external.html#nbi-isolated-environment
sudo -u ec2-user -i <<'EOF'
unset SUDO_UID
WORKING_DIR=/home/ec2-user/SageMaker/custom-miniconda/
source "$WORKING_DIR/miniconda/bin/activate"

for env in $WORKING_DIR/miniconda/envs/*; do

BASENAME=$(basename "$env")
source activate "$BASENAME"

python -m ipykernel install --user --name "$BASENAME" --display-name "Custom ($BASENAME)"
done
# Optionally, uncomment these lines to disable SageMaker-provided Conda functionality.

# echo "c.EnvironmentKernelSpecManager.use_conda_directly = False" >> /home/ec2-user/.jupyter/jupyter_notebook_config.py

# rm /home/ec2-user/.condarc
EOF
echo "Restarting the Jupyter server.."
# For notebook instance with alinux (notebook-al1-v1)
initctl restart jupyter-server --no-wait
# Use this instead for notebook instance with alinux2 (notebook-al2-v1)
systemctl restart jupyter-server

6.    On the stopped notebook instance, add the on-start script as a lifecycle configuration. This script makes the custom environment available as a kernel in Jupyter every time that you start the notebook instance.

7.    Start the notebook instance, and then install your custom libraries in the custom environment.

For example, to install pyarrow:

import sys
!conda install --yes --prefix {sys.prefix} -c conda-forge pyarrow

If you get an error message that says that you need to update Conda, run the following commands. Then, try installing the custom libraries again.

!conda install -p "/home/ec2-user/anaconda3" "conda>=4.8" --yes
!conda install -p "/home/ec2-user/SageMaker/custom-miniconda/miniconda" "conda>=4.8" --yes

If you stop and then start your notebook instance, your custom Conda environment and libraries are still available. You don't have to install them again.

Note: You can use the Amazon CloudWatch logs to troubleshoot issues with lifecycle configuration scripts. You can view the script execution logs in the log stream LifecycleConfigOnStart under the aws/sagemaker/studio namespace.


Related information

Amazon SageMaker notebook instance lifecycle configuration samples

Lifecycle configuration best practices

Debugging lifecycle configurations

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago