How does sagemaker serves model for inference?

0

I am create a sagemaker endpoint , but want to test multi model feature . based on the docs. multi model endpoint will download model artifact and serve/host the model in the sagemaker endpoint. can sagemaker multi model endpoint , utilize multiple gpu instance to server multiple models same time?

preguntada hace 10 meses178 visualizaciones
1 Respuesta
0

Hi,

Yes, SageMaker multi-model endpoints supports GPU: see https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html

Multi-model endpoints support hosting both CPU and GPU backed models. By using 
GPU backed models, you can lower your model deployment costs through increased 
usage of the endpoint and its underlying accelerated compute instances.

Multi-model endpoints also enable time-sharing of memory resources across your models. 
This works best when the models are fairly similar in size and invocation latency. 
When this is the case, multi-model endpoints can effectively use instances across all models. 
If you have models that have significantly higher transactions per second (TPS) or latency 
requirements, we recommend hosting them on dedicated endpoints.

Yes, you can go multi-instances by using autocaling: https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints-autoscaling.html

SageMaker multi-model endpoints fully support automatic scaling, which manages replicas 
of models to ensure models scale based on traffic patterns. We recommend that you 
configure your multi-model endpoint and the size of your instances based on Instance 
recommendations for multi-model endpoint deployments and also set up instance based 
auto scaling for your endpoint. The invocation rates used to trigger an auto-scale event are 
based on the aggregate set of predictions across the full set of models served by the endpoint. 

Hope it helps, Didier

profile pictureAWS
EXPERTO
respondido hace 10 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas