1 回答
- 最新
- 投票最多
- 评论最多
0
Hi,
What you may explore is provisioned concurrency for Amazon SageMaker Serverless Inference: see https://aws.amazon.com/blogs/machine-learning/announcing-provisioned-concurrency-for-amazon-sagemaker-serverless-inference/
You can tune finely:
ServerlessProvisionedConcurrencyExecutions – The number of concurrent runs handled by the endpoint
ServerlessProvisionedConcurrencyUtilization – The number of concurrent runs divided by the allocated
provisioned concurrency
ServerlessProvisionedConcurrencyInvocations – The number of InvokeEndpoint requests handled by the
provisioned concurrency
ServerlessProvisionedConcurrencySpilloverInvocations – The number of InvokeEndpoint requests not handled
provisioned concurrency, which is handled by on-demand Serverless Inference
Best,
Didier
相关内容
- AWS 官方已更新 3 年前
- AWS 官方已更新 1 年前