Is it possible to achieve multi-threading in SageMaker Endpoint

0

Context : I have a sagemaker endpoint (real-time) that essentially performs two KNN searches on two separate datasets. I'd like to parallelize the two KNN searches by creating a thread pool of some sort. Is that possible to do in SageMaker? and if so is it recommended?

Another option is to have a lambda to split the request into two invoke_endpoint() calls - in that way the endpoint will be triggered twice, once each dataset. However, to do so I need to have a multi worker/host endpoint which can bring up the costs. so I'd like to explore multi-threading in the model itself first.

Thanks!!!

zachliu
asked 7 months ago361 views
1 Answer
0

Hi,

What you may explore is provisioned concurrency for Amazon SageMaker Serverless Inference: see https://aws.amazon.com/blogs/machine-learning/announcing-provisioned-concurrency-for-amazon-sagemaker-serverless-inference/

You can tune finely:

ServerlessProvisionedConcurrencyExecutions – The number of concurrent runs handled by the endpoint
ServerlessProvisionedConcurrencyUtilization – The number of concurrent runs divided by the allocated 
provisioned concurrency
ServerlessProvisionedConcurrencyInvocations – The number of InvokeEndpoint requests handled by the 
provisioned concurrency
ServerlessProvisionedConcurrencySpilloverInvocations – The number of InvokeEndpoint requests not handled 
provisioned concurrency, which is handled by on-demand Serverless Inference

Best,

Didier

profile pictureAWS
EXPERT
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions