How does sagemaker serves model for inference?

0

I am create a sagemaker endpoint , but want to test multi model feature . based on the docs. multi model endpoint will download model artifact and serve/host the model in the sagemaker endpoint. can sagemaker multi model endpoint , utilize multiple gpu instance to server multiple models same time?

질문됨 10달 전178회 조회
1개 답변
0

Hi,

Yes, SageMaker multi-model endpoints supports GPU: see https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html

Multi-model endpoints support hosting both CPU and GPU backed models. By using 
GPU backed models, you can lower your model deployment costs through increased 
usage of the endpoint and its underlying accelerated compute instances.

Multi-model endpoints also enable time-sharing of memory resources across your models. 
This works best when the models are fairly similar in size and invocation latency. 
When this is the case, multi-model endpoints can effectively use instances across all models. 
If you have models that have significantly higher transactions per second (TPS) or latency 
requirements, we recommend hosting them on dedicated endpoints.

Yes, you can go multi-instances by using autocaling: https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints-autoscaling.html

SageMaker multi-model endpoints fully support automatic scaling, which manages replicas 
of models to ensure models scale based on traffic patterns. We recommend that you 
configure your multi-model endpoint and the size of your instances based on Instance 
recommendations for multi-model endpoint deployments and also set up instance based 
auto scaling for your endpoint. The invocation rates used to trigger an auto-scale event are 
based on the aggregate set of predictions across the full set of models served by the endpoint. 

Hope it helps, Didier

profile pictureAWS
전문가
답변함 10달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠