Hello,
I've been trying to deploy multiple PyTorch models on one endpoint on SageMaker from a SageMaker Notebook. First I tested deployment of single models on single endpoints, to check if everything works smoothly and it did. I would create a PyTorchModel first:
import sagemaker
from sagemaker.pytorch import PyTorchModel
from sagemaker import get_execution_role
from sagemaker.multidatamodel import MultiDataModel
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer
import boto3
role = get_execution_role()
sagemaker_session = sagemaker.Session()
pytorch_model = PyTorchModel(
entry_point='inference.py',
source_dir='code',
role=role,
model_data='s3://***/model/model.tar.gz',
framework_version='1.11.0',
py_version='py38',
name='***-model',
sagemaker_session=sagemaker_session
)
MultiDataModel inherits properties from Model classes, so I used the same PyTorch model that I used for single model deployment.
Then I would define the MultiDataModel the following way:
models = MultiDataModel(name='***-multi-model',
model_data_prefix='s3://***-sagemaker/model/',
model=pytorch_model,
sagemaker_session=sagemaker_session
)
All it should need is the prefix to the S3 bucket of the model artifacts saved as tar.gz files (the same files used for single model deployment), the previously defined PyTorch model, a name and a sagemaker_session.
To deploy it:
models.deploy(initial_instance_count =1,
instance_type='ml.m4.xlarge',
serializer=JSONSerializer(),
deserializer=JSONDeserializer(),
endpoint_name='***-multi-model-deployment',
)
The deployment goes well, as there are no failures and the endpoint is InService by the end of this step.
However the error occurs when I try to run inference on one of the models:
import json
body = {"url":"https://***image.jpg"} #url to an image online
payload = json.dumps(body)
client = boto3.client('sagemaker-runtime')
response = client.invoke_endpoint(
EndpointName = "***-multi-model-deployment",
ContentType = "application/json",
TargetModel = "/model.tar.gz",
Body = payload)
This prompts an error message:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "{
"code": 500,
"type": "InternalServerException",
"message": "Failed to start workers for model ec1cd509c40ca81ffc3fb09deb4599e2 version: 1.0"
}
". See https://***.console.aws.amazon.com/cloudwatch/home?region=***#logEventViewer:group=/aws/sagemaker/Endpoints/***-multi-model-deployment in account ***** for more information.
The Cloudwatch logs show this error in particular:
22-09-26T15:51:40,494 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_service_worker.py", line 210, in <module>
2022-09-26T15:51:40,494 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - worker.run_server()
2022-09-26T15:51:40,494 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_service_worker.py", line 181, in run_server
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self.handle_connection(cl_socket)
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_service_worker.py", line 139, in handle_connection
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - service, result, code = self.load_model(msg)
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_service_worker.py", line 104, in load_model
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - service = model_loader.load(
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_loader.py", line 151, in load
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - initialize_fn(service.context)
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_pytorch_serving_container/handler_service.py", line 51, in initialize
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - super().initialize(context)
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_inference/default_handler_service.py", line 66, in initialize
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self._service.validate_and_initialize(model_dir=model_dir)
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_inference/transformer.py", line 162, in validate_and_initialize
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self._model = self._model_fn(model_dir)
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_pytorch_serving_container/default_pytorch_inference_handler.py", line 73, in default_model_fn
2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - raise ValueError(
2022-09-26T15:51:40,496 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - ValueError: Exactly one .pth or .pt file is required for PyTorch models: []
It seems like it's having problems loading the model, saying only one .pth file is required, however in the invocation function i point to the exact model artifact present at that S3 bucket prefix. I'm having a hard time trying to fix this issue, so it would be very helpful if anyone had some suggestions!
Instead of giving the MultiDataModel a model, I also tried providing it an ECR docker image with the same inference code, but I would get the same error during invocation of the endpoint.