How does sagemaker serves model for inference?

0

I am create a sagemaker endpoint , but want to test multi model feature . based on the docs. multi model endpoint will download model artifact and serve/host the model in the sagemaker endpoint. can sagemaker multi model endpoint , utilize multiple gpu instance to server multiple models same time?

asked 9 months ago168 views
1 Answer
0

Hi,

Yes, SageMaker multi-model endpoints supports GPU: see https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html

Multi-model endpoints support hosting both CPU and GPU backed models. By using 
GPU backed models, you can lower your model deployment costs through increased 
usage of the endpoint and its underlying accelerated compute instances.

Multi-model endpoints also enable time-sharing of memory resources across your models. 
This works best when the models are fairly similar in size and invocation latency. 
When this is the case, multi-model endpoints can effectively use instances across all models. 
If you have models that have significantly higher transactions per second (TPS) or latency 
requirements, we recommend hosting them on dedicated endpoints.

Yes, you can go multi-instances by using autocaling: https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints-autoscaling.html

SageMaker multi-model endpoints fully support automatic scaling, which manages replicas 
of models to ensure models scale based on traffic patterns. We recommend that you 
configure your multi-model endpoint and the size of your instances based on Instance 
recommendations for multi-model endpoint deployments and also set up instance based 
auto scaling for your endpoint. The invocation rates used to trigger an auto-scale event are 
based on the aggregate set of predictions across the full set of models served by the endpoint. 

Hope it helps, Didier

profile pictureAWS
EXPERT
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions