Total concurrency for Serverless Infernece is not 200! Only 10!

0

Hello, anybody please help!

Following the SageMaker document, "The total concurrency you can share between all serverless endpoints per Region in your account is 200".

https://sagemaker-examples.readthedocs.io/en/latest/serverless-inference/huggingface-serverless-inference/huggingface-text-classification-serverless-inference.html#:~:text=You%20can%20set%20the%20maximum,in%20your%20account%20is%20200.

However, today, when I host a new Serverless Inference endpoint I got this error, "An error occurred (ResourceLimitExceeded) when calling the CreateEndpoint operation: The account-level service limit 'Maximum total concurrency that can be allocated across all serverless endpoints' is 10"

--> So, the quota for "Maximum total concurrency" of serverless endpoints DROPPED down from 200 to 10! --> This is a significant drop and definitely will affect many systems run on AWS.

hdvvip
質問済み 2ヶ月前206ビュー
2回答
1

The maximum concurrency for a single endpoint is the limit of simultaneous invocations that one particular endpoint can handle, set up to 200. The maximum total concurrency across all serverless endpoints is the sum of concurrent invocations that all endpoints combined can handle, capped at 10 for the entire account. Even though a single endpoint can handle up to 200 invocations, the overall limit of 10 invocations across the account is the overriding constraint.

profile picture
エキスパート
回答済み 2ヶ月前
0

I confirm the Service Quotas for total concurrency of Serverless Inference also dropped from 200 to 10!!! Serverless Quoutas

hdvvip
回答済み 2ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ