Sagemaker custom image started failing

1

Since yesterday, Sagemaker Studio has started giving me an error every time I want to open a notebook using a custom image. I get this error:

Failed to start kernelFailed to launch app [**-**-**-ml-m5-large-309d4926425841270d******]. CustomImageError: SageMaker is unable to create an App using the specified ECR image [******.dkr.ecr.us-east-1.amazonaws.com/ecr-sagemaker-shared-services-image@sha256:*****41c91aa96c8ad69a5abea60dfa58edccf06f48f64189d9] .

Inspect the cloudwatch logs for detailed diagnostic information. (Context: RequestId: 35dcea16-c511-40cd-ba69-****, TimeStamp: 1694525815.2390692, Date: Tue Sep 12 13:36:55 2023)

When I check the cloudwatch logs, I don't see any errors or anything unusual.

timestamp,message 1694525797638,'"+ CONDA_DIR=/opt/.sagemakerinternal/conda" 1694525797638,'"+ CONDA_ENV_FILTER=/opt/conda$" 1694525797638,'"+ command -v python" 1694525797638,'"+ [ 0 -eq 0 ]" 1694525797638,'"+ python -c from future import print_function;import sys; print(sys.prefix)" 1694525797638,'"+ SYSTEM_PYTHON_PREFIX=/opt/conda" 1694525797638,'"+ export JUPYTER_PATH=/opt/conda/share/jupyter/" 1694525797638,'"+ [ ! -f /opt/conda/share/jupyter/kernels/python3/kernel.json ]" 1694525797638,'"+ echo Using system included Python3 kernel." 1694525797638,'"+ export PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tmp/miniconda3/condabin:/tmp/anaconda3/condabin:/tmp/miniconda2/condabin:/tmp/anaconda2/condabin:/tmp/mambaforge/condabin" 1694525797638,'"+ export AWS_SAGEMAKER_PYTHONNOUSERSITE=0" 1694525797638,'"+ PYTHONNOUSERSITE=1 /opt/.sagemakerinternal/conda/bin/jupyter-kernelgateway --ip 0.0.0.0 --port 8888 --JupyterWebsocketPersonality.list_kernels=True --KernelSpecManager.ensure_native_kernel=False --MultiKernelManager.default_kernel_name= --KernelGatewayApp.kernel_spec_manager_class=nb_conda_kernels.CondaKernelSpecManager --CondaKernelSpecManager.env_filter=/opt/conda$" 1694525802125,Using system included Python3 kernel. 1694525802125,"[KernelGatewayApp] [nb_conda_kernels] enabled, 2 kernels found" 1694525806638,[KernelGatewayApp] Jupyter Kernel Gateway at http://0.0.0.0:8888

Also, If I start an instance with a default kernel and then switch to my custom image it loads without errors. This error seems to only happen when starting a new instance.

Neither the image nor the config have been updated. It started happening out of nowhere

Gonzalo
asked 8 months ago220 views
1 Answer
1

Hello,

Thank you for using Sagemaker Service.

Regarding the 'CustomImageError' that you are facing while launching Studio Notebook with a custom image, I would like to inform you that between September 9, 9:00 AM and September 12, 4:30 PM PDT, we experienced increased failure of launching SageMaker Studio app in the US-EAST-1 region. The issue has been resolved and the service is operating normally. Therefore, you should now be able to successfully launch your Studio Notebook using the custom image.

However, if the issue still persists, I'd recommend you to reach out to AWS Premium Support by creating a support case[1] so that we can investigate further.

Reference:

[1] Creating support cases - https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case

AWS
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions