1 Answer
- Newest
- Most votes
- Most comments
0
Hi,
Yes, SageMaker multi-model endpoints supports GPU: see https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html
Multi-model endpoints support hosting both CPU and GPU backed models. By using
GPU backed models, you can lower your model deployment costs through increased
usage of the endpoint and its underlying accelerated compute instances.
Multi-model endpoints also enable time-sharing of memory resources across your models.
This works best when the models are fairly similar in size and invocation latency.
When this is the case, multi-model endpoints can effectively use instances across all models.
If you have models that have significantly higher transactions per second (TPS) or latency
requirements, we recommend hosting them on dedicated endpoints.
Yes, you can go multi-instances by using autocaling: https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints-autoscaling.html
SageMaker multi-model endpoints fully support automatic scaling, which manages replicas
of models to ensure models scale based on traffic patterns. We recommend that you
configure your multi-model endpoint and the size of your instances based on Instance
recommendations for multi-model endpoint deployments and also set up instance based
auto scaling for your endpoint. The invocation rates used to trigger an auto-scale event are
based on the aggregate set of predictions across the full set of models served by the endpoint.
Hope it helps, Didier
Relevant content
- Accepted Answer
- Accepted Answerasked 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 8 months ago