Total concurrency for Serverless Infernece is not 200! Only 10!

0

Hello, anybody please help!

Following the SageMaker document, "The total concurrency you can share between all serverless endpoints per Region in your account is 200".

https://sagemaker-examples.readthedocs.io/en/latest/serverless-inference/huggingface-serverless-inference/huggingface-text-classification-serverless-inference.html#:~:text=You%20can%20set%20the%20maximum,in%20your%20account%20is%20200.

However, today, when I host a new Serverless Inference endpoint I got this error, "An error occurred (ResourceLimitExceeded) when calling the CreateEndpoint operation: The account-level service limit 'Maximum total concurrency that can be allocated across all serverless endpoints' is 10"

--> So, the quota for "Maximum total concurrency" of serverless endpoints DROPPED down from 200 to 10! --> This is a significant drop and definitely will affect many systems run on AWS.

hdvvip
질문됨 2달 전206회 조회
2개 답변
1

The maximum concurrency for a single endpoint is the limit of simultaneous invocations that one particular endpoint can handle, set up to 200. The maximum total concurrency across all serverless endpoints is the sum of concurrent invocations that all endpoints combined can handle, capped at 10 for the entire account. Even though a single endpoint can handle up to 200 invocations, the overall limit of 10 invocations across the account is the overriding constraint.

profile picture
전문가
답변함 2달 전
0

I confirm the Service Quotas for total concurrency of Serverless Inference also dropped from 200 to 10!!! Serverless Quoutas

hdvvip
답변함 2달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠