Total concurrency for Serverless Infernece is not 200! Only 10!

0

Hello, anybody please help!

Following the SageMaker document, "The total concurrency you can share between all serverless endpoints per Region in your account is 200".

https://sagemaker-examples.readthedocs.io/en/latest/serverless-inference/huggingface-serverless-inference/huggingface-text-classification-serverless-inference.html#:~:text=You%20can%20set%20the%20maximum,in%20your%20account%20is%20200.

However, today, when I host a new Serverless Inference endpoint I got this error, "An error occurred (ResourceLimitExceeded) when calling the CreateEndpoint operation: The account-level service limit 'Maximum total concurrency that can be allocated across all serverless endpoints' is 10"

--> So, the quota for "Maximum total concurrency" of serverless endpoints DROPPED down from 200 to 10! --> This is a significant drop and definitely will affect many systems run on AWS.

hdvvip
asked 2 months ago194 views
2 Answers
1

The maximum concurrency for a single endpoint is the limit of simultaneous invocations that one particular endpoint can handle, set up to 200. The maximum total concurrency across all serverless endpoints is the sum of concurrent invocations that all endpoints combined can handle, capped at 10 for the entire account. Even though a single endpoint can handle up to 200 invocations, the overall limit of 10 invocations across the account is the overriding constraint.

profile picture
EXPERT
answered a month ago
0

I confirm the Service Quotas for total concurrency of Serverless Inference also dropped from 200 to 10!!! Serverless Quoutas

hdvvip
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions