1 Answer
- Newest
- Most votes
- Most comments
0
Hi,
What you may explore is provisioned concurrency for Amazon SageMaker Serverless Inference: see https://aws.amazon.com/blogs/machine-learning/announcing-provisioned-concurrency-for-amazon-sagemaker-serverless-inference/
You can tune finely:
ServerlessProvisionedConcurrencyExecutions – The number of concurrent runs handled by the endpoint
ServerlessProvisionedConcurrencyUtilization – The number of concurrent runs divided by the allocated
provisioned concurrency
ServerlessProvisionedConcurrencyInvocations – The number of InvokeEndpoint requests handled by the
provisioned concurrency
ServerlessProvisionedConcurrencySpilloverInvocations – The number of InvokeEndpoint requests not handled
provisioned concurrency, which is handled by on-demand Serverless Inference
Best,
Didier
Relevant content
- Accepted Answerasked 2 years ago
- asked 9 months ago
- asked 6 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago