By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Is possible to deploy an Asynchronous MultiModelEndpoint using SangeMaker-Inference+MultiModelServer?

0

Hi!

I've tried to create a MultiModelEndpoint using an asyncrhonous configuration, but once I'm deploying the endpoint, it gives me an error saying that MultiModelEndPoint is not compatible with an asynchronous configuration. I did create an asynchronous endpoint and I added multiple models to it, making the endpoint to believe that it's only containing one model, but this is not a good solution as I'm not getting the advantages of a multimodel endpoint, such as unloading a model, and it has to download all the models each time an instance is created.

I tried to create a MultiDataModel with asyn_config, as it does not give any error, but it seems that the configuration is not being applied either, so instead of getting an error, I'm just deploying a real time MultiDataPoint.

I found that I can build my own MultiModelServer (MMS) using the sagemaker-inference-toolkit with the multi-model-server packages, but It does not specify an asynchronous configuration can be set, just for logging. I found that I can set the number of workers, but I don't know if this implies to scale-in the number of instances to zero or not, but I think it does not scale down to zero instances. I find this approach very interesting as I can configure the endpoint to what I want, but it seems that I won't be able to apply an asynchronous configuration.

I don't know if the MMS is a feasible solution. I can run the models on a Lambda, or even in a Serverless Endpoint, but I won't have as much benefits as I can with the different instances available for Asynchronous/MultiModel endpoints, as I have access to more RAM, GPUs, etc.

Thanks everyone for your attention!

1 Answer
0

Based on the current information available, it is not possible to directly deploy an Asynchronous Multi-Model Endpoint using SageMaker-Inference and Multi-Model Server (MMS) as a single combined solution.

Multi-Model Endpoints (MMEs) and Asynchronous Inference are separate features in Amazon SageMaker, and they are not designed to work together out of the box. MMEs are primarily designed for real-time inference scenarios where multiple models can be hosted on the same endpoint, dynamically loaded and unloaded based on demand.

Asynchronous Inference, on the other hand, is designed for long-running inference jobs or handling large payload sizes, but it doesn't inherently support the dynamic model loading and unloading capabilities of MMEs.

The error you encountered when trying to create a Multi-Model Endpoint with an asynchronous configuration confirms this incompatibility. Your workaround of creating an asynchronous endpoint and adding multiple models to it, while functional, indeed doesn't provide the benefits of true MMEs such as dynamic model loading and unloading.

Regarding the use of SageMaker Inference Toolkit with Multi-Model Server (MMS), this is primarily designed for building custom containers that support Multi-Model Endpoints. However, it doesn't provide built-in support for asynchronous inference configurations.

While it's possible to adjust worker configurations in MMS, this doesn't directly translate to the ability to scale instances to zero or provide the full benefits of asynchronous inference.

For your use case, you might need to consider alternative architectures. If you require both the ability to host multiple models and handle asynchronous workloads, you might need to implement a custom solution. This could involve using separate endpoints for different models or implementing your own model loading/unloading logic within a custom container.

Alternatively, you could explore using SageMaker Serverless Inference, which can automatically scale to zero when not in use. However, this would not provide the full benefits of Multi-Model Endpoints in terms of dynamic model loading and unloading.

In summary, while both Multi-Model Endpoints and Asynchronous Inference are powerful features of SageMaker, they are currently not designed to be used together in a single endpoint configuration. Your specific use case may require a custom solution or a different architectural approach.
Sources
Build Your Own Container for SageMaker Multi-Model Endpoints - Amazon SageMaker
Multi-model endpoints - Amazon SageMaker
How to create an Asynchronous Inference Endpoint - Amazon SageMaker

profile picture
answered 15 days ago
AWS
EXPERT
reviewed 15 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions