Skip to content

Custom docker inference container failing on endpoint creation

0

Hi, I have a docker container containing a custom inference script that keeps failing on endpoint creation with the following error:

CannotStartContainerError. Please ensure the model container for variant variant-name-1 starts correctly when invoked with 'docker run <image> serve'

The container is running fine locally. Don't know why the container keeps failing in sagemaker. Sharing the command I use to run the docker image locally for testing . docker run --gpus all -v C:/Users/User/.aws:/root/.aws -p 8001:8080 e49ca31a46f5 . For local testing, I am mounting my aws credentials to the docker because the inference scripts needs to write the output to S3.

How can I debug what I am doing wrong.

1 Answer
0

Hi,

It is not best practice (at all!) to embark credentials like S3 into your container image: that will create all kinds of IAM issues very probably like the one that you are currently facing.

All your credentials should be in the execution role associated with your SageMaker container instance. See https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html to implement your needed S3 credentials plus all others you need in the execution role.

Best,

Didier

EXPERT
answered 2 years ago
  • Hi, the command I shared is for local testing of the container and I mount the drive containing my credentials to the local container for testing. But in real case, yes, the role for the sagemaker instance will do the job. But what can be the reason for the deployment failure? The error is not at all clear.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.