- Newest
- Most votes
- Most comments
Based on the current information available, it is not possible to directly deploy an Asynchronous Multi-Model Endpoint using SageMaker-Inference and Multi-Model Server (MMS) as a single combined solution.
Multi-Model Endpoints (MMEs) and Asynchronous Inference are separate features in Amazon SageMaker, and they are not designed to work together out of the box. MMEs are primarily designed for real-time inference scenarios where multiple models can be hosted on the same endpoint, dynamically loaded and unloaded based on demand.
Asynchronous Inference, on the other hand, is designed for long-running inference jobs or handling large payload sizes, but it doesn't inherently support the dynamic model loading and unloading capabilities of MMEs.
The error you encountered when trying to create a Multi-Model Endpoint with an asynchronous configuration confirms this incompatibility. Your workaround of creating an asynchronous endpoint and adding multiple models to it, while functional, indeed doesn't provide the benefits of true MMEs such as dynamic model loading and unloading.
Regarding the use of SageMaker Inference Toolkit with Multi-Model Server (MMS), this is primarily designed for building custom containers that support Multi-Model Endpoints. However, it doesn't provide built-in support for asynchronous inference configurations.
While it's possible to adjust worker configurations in MMS, this doesn't directly translate to the ability to scale instances to zero or provide the full benefits of asynchronous inference.
For your use case, you might need to consider alternative architectures. If you require both the ability to host multiple models and handle asynchronous workloads, you might need to implement a custom solution. This could involve using separate endpoints for different models or implementing your own model loading/unloading logic within a custom container.
Alternatively, you could explore using SageMaker Serverless Inference, which can automatically scale to zero when not in use. However, this would not provide the full benefits of Multi-Model Endpoints in terms of dynamic model loading and unloading.
In summary, while both Multi-Model Endpoints and Asynchronous Inference are powerful features of SageMaker, they are currently not designed to be used together in a single endpoint configuration. Your specific use case may require a custom solution or a different architectural approach.
Sources
Build Your Own Container for SageMaker Multi-Model Endpoints - Amazon SageMaker
Multi-model endpoints - Amazon SageMaker
How to create an Asynchronous Inference Endpoint - Amazon SageMaker
Relevant content
- Accepted Answerasked a year ago
- asked 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
To help clarify, there's a nice matrix here of feature support that shows MME and Async can't work together (along with some other combinations): https://docs.aws.amazon.com/sagemaker/latest/dg/model-deploy-feature-matrix.html