Skip to content

Base GPU image for SageMaker JupyterLab to build upon

0

What is the correct base GPU container image for SageMaker JupyterLab that I can build my container on? Is there an example dockerfile for building on top of the base image? I have tried options in this page but they all gave me "internal failure" when starting a SageMaker JupyterLab.

I was able to build my own container for SageMaker Studio Classic and SageMaker training job. But SageMaker JupyterLab has been giving me Internal failure when I use my custom container.

3 Answers
2
Accepted Answer

Hello,

Please find the Sagemaker distribution image used for JupyterLab below. You can customise this image as per your requirement.

[+] https://github.com/aws/sagemaker-distribution

Additionally, please find the steps and sample images for building a custom JupyterLab image.

[+] https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-admin-guide-custom-images.html

Hope this helps.

AWS
answered 9 months ago
EXPERT
reviewed 9 months ago
  • Thank you for the link to the container images. But the SageMaker Distribution images still give me "InternalFailure". You can reproduce the error by spinning up a SageMaker JupyterLab with public.ecr.aws/sagemaker/sagemaker-distribution:2.0-cpu on instance ml.t3.medium. I uploaded the image to my ECR without modification.

  • Hello Tim,

    Sorry for replying late.

    Where you able to run the container locally before you deploy it on Sagemaker studio?. You can run it using docker run -it <image_id> and access Jupyterlab running locally.

    If the image runs locally fine with out any issues,I would suggest you to review your app image config specially to make sure you creating it for right app type i.e Jupyterlab. It should look like some thing below.

    response = client.create_app_image_config(
        AppImageConfigName='jupytercustomimage-config',
        Tags=[
            {
                'Key': 'type',
                'Value': 'custom'
            },
        ],
        JupyterLabAppImageConfig={
        'ContainerConfig': {
            'ContainerEnvironmentVariables': { ### optional
                'type': 'custom'
            }
        }
    }
    
    )
    
  • thank you, and yes I was able to start a SageMaker JupyterLab. The not-obvious part was that calling jupyter-lab as the entry point was necessary. I thought the pre-built image would already include the setup of entry point etc. But I was able to figure it out following the doc you linked

1

You can take a look at latest here https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/dlc-release-notes.html and https://github.com/aws/deep-learning-containers/releases/tag/v1.0-pt-ec2-2.4.0-tr-py311

for example, you can try 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.4.0-cpu-py311-ubuntu22.04-ec2-v1.0

AWS
EXPERT
answered 9 months ago
EXPERT
reviewed 9 months ago
  • The base images you linked are the same as what I linked. Those base container does not work on SageMaker JupyterLab. They keep giving "internal failure". Have you or any one from your team tried using custom image on SageMaker JupyterLab? Could you share a dockerfile that works?

    I have been poking around different ways of building containers. But eventually they all give me "internal failure". This is a very obscure error message that I have no direction to fix it.

  • Did you look at logs for the notebook instance for details? There could be multiple reasons, instance might not have sufficient resources to run the image properly, try using larger instance and the issue may be with notebook instance configuration as well, not knowing details tough to point out the issue.

-1

For creating a custom GPU-enabled container for SageMaker JupyterLab, you can use the NVIDIA CUDA base images as a starting point. While there isn't a specific pre-built GPU image for SageMaker JupyterLab, you can build upon the NVIDIA CUDA images and add the necessary components for SageMaker compatibility.

To create a custom container for SageMaker JupyterLab:

  1. Start with an NVIDIA CUDA base image in your Dockerfile.
  2. Install the required dependencies, including PyTorch, NCCL, and other necessary libraries.
  3. Add the SageMaker-specific components and tools.
  4. Install any additional packages or libraries you need for your specific use case.

It's important to note that the process for creating a custom container for SageMaker JupyterLab may differ slightly from creating one for SageMaker Studio Classic or SageMaker training jobs. The "internal failure" you're experiencing when using your custom container with SageMaker JupyterLab could be due to missing components or incompatibilities specific to the JupyterLab environment.

To resolve this issue, you may need to:

  1. Ensure your Dockerfile includes all necessary components for SageMaker JupyterLab compatibility.
  2. Double-check that you're using the correct base image and versions of libraries compatible with SageMaker JupyterLab.
  3. Review the SageMaker documentation for any specific requirements or guidelines for custom containers in JupyterLab.
  4. Consider reaching out to AWS support for assistance with the "internal failure" error, as it may be related to specific configuration issues.

Remember that creating custom containers for SageMaker environments can be complex, and you may need to iterate on your Dockerfile to achieve full compatibility with SageMaker JupyterLab.
Sources
Create your own Docker container with the SageMaker distributed data parallel library - Amazon SageMaker
Bring your own SageMaker image - Amazon SageMaker

answered 9 months ago
  • This response is not really helpful. I am specifically asking for what correct base image to use.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.