I'm using Sagemaker to host a multi-container endpoint which includes a multi-model container and a post-processing single-model container: I'm setting this up as so:
mme_container = {
"Image": mme_triton_image_uri,
"ModelDataUrl": model_data_url,
"Mode": "MultiModel",
"Environment": {
"SAGEMAKER_TRITON_MODEL_LOAD_GPU_LIMIT": "0.8",
}
}
torch_container = {
'Image': '763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04',
'ModelDataUrl': '{bucket_url}/post_process.tar.gz'
}
instance_type = "ml.g5.xlarge"
response = sm_client.create_model(
ModelName = serial_model_name,
ExecutionRoleArn = role,
Containers = [mme_container,torch_container]
)
create_endpoint_config_response = sm_client.create_endpoint_config(
EndpointConfigName=endpoint_config_name,
ProductionVariants=[
{
"InstanceType": instance_type,
"InitialVariantWeight": 1,
"InitialInstanceCount": 1,
"ModelName": serial_model_name,
"VariantName": "AllTraffic",
}
],
)
Our Pre-Build Triton Docker Container Extension DockerBuild file:
# SageMaker PyTorch image
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tritonserver:22.07-py3
# FROM 301217895009.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tritonserver:22.07-py3
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
LABEL com.amazonaws.sagemaker.capabilities.multi-models=true
ENV SAGEMAKER_MULTI_MODEL=true
ENV SAGEMAKER_BIND_TO_PORT=8080
EXPOSE 8080
RUN pip install -U pip
RUN pip install --upgrade diffusers==0.25.0 transformers==4.36.1 accelerate numpy xformers scipy omegaconf torch torchvision pytorch_lightning pynvml
RUN pip install git+https://github.com/sberbank-ai/Real-ESRGAN.git
RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
The Errors:
The endpoint is in the creating status for about 1-2 hours, and in that time it follows the following pattern:
- There are no logs from either container_1 or container-2 for the first ~15-30 minutes
- When the logs do finally show up, it is only from container-2 all the way until the endpoint fails
Interestingly enough, when using old PyTorch or hugging-face docker images, the containers both load successfully.
We've tried various things such as:
- increasing the instance_type to 4x large
- Adding various environment variables into the MME container such as: 'SAGEMAKER_PROGRAM': '', 'SAGEMAKER_SUBMIT_DIRECTORY': '',"SAGEMAKER_TRITON_MODEL_LOAD_GPU_LIMIT": "0.8", "SAGEMAKER_MULTI_MODEL": "true", "SM_LOG_LEVEL": "10"
Through the various things we've done, the only way we've managed to get logs from container_1 were by using the following docker images:
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.7.1-transformers4.6.1-gpu-py36-cu110-ubuntu18.04
763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.3-gpu-py3
And while the above pre-built docker images worked with our custom extended sagemaker-triton docker image, they were too old to handle the necessary requirements of our model.
Any help as to debugging this issue would be greatly appreciated.