Total concurrency for Serverless Infernece is not 200! Only 10!

0

Hello, anybody please help!

Following the SageMaker document, "The total concurrency you can share between all serverless endpoints per Region in your account is 200".

https://sagemaker-examples.readthedocs.io/en/latest/serverless-inference/huggingface-serverless-inference/huggingface-text-classification-serverless-inference.html#:~:text=You%20can%20set%20the%20maximum,in%20your%20account%20is%20200.

However, today, when I host a new Serverless Inference endpoint I got this error, "An error occurred (ResourceLimitExceeded) when calling the CreateEndpoint operation: The account-level service limit 'Maximum total concurrency that can be allocated across all serverless endpoints' is 10"

--> So, the quota for "Maximum total concurrency" of serverless endpoints DROPPED down from 200 to 10! --> This is a significant drop and definitely will affect many systems run on AWS.

hdvvip
已提問 2 個月前檢視次數 205 次
2 個答案
1

The maximum concurrency for a single endpoint is the limit of simultaneous invocations that one particular endpoint can handle, set up to 200. The maximum total concurrency across all serverless endpoints is the sum of concurrent invocations that all endpoints combined can handle, capped at 10 for the entire account. Even though a single endpoint can handle up to 200 invocations, the overall limit of 10 invocations across the account is the overriding constraint.

profile picture
專家
已回答 2 個月前
0

I confirm the Service Quotas for total concurrency of Serverless Inference also dropped from 200 to 10!!! Serverless Quoutas

hdvvip
已回答 2 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南