Total concurrency for Serverless Infernece is not 200! Only 10!

0

Hello, anybody please help!

Following the SageMaker document, "The total concurrency you can share between all serverless endpoints per Region in your account is 200".

https://sagemaker-examples.readthedocs.io/en/latest/serverless-inference/huggingface-serverless-inference/huggingface-text-classification-serverless-inference.html#:~:text=You%20can%20set%20the%20maximum,in%20your%20account%20is%20200.

However, today, when I host a new Serverless Inference endpoint I got this error, "An error occurred (ResourceLimitExceeded) when calling the CreateEndpoint operation: The account-level service limit 'Maximum total concurrency that can be allocated across all serverless endpoints' is 10"

--> So, the quota for "Maximum total concurrency" of serverless endpoints DROPPED down from 200 to 10! --> This is a significant drop and definitely will affect many systems run on AWS.

2 Risposte
1

The maximum concurrency for a single endpoint is the limit of simultaneous invocations that one particular endpoint can handle, set up to 200. The maximum total concurrency across all serverless endpoints is the sum of concurrent invocations that all endpoints combined can handle, capped at 10 for the entire account. Even though a single endpoint can handle up to 200 invocations, the overall limit of 10 invocations across the account is the overriding constraint.

profile picture
ESPERTO
con risposta 2 mesi fa
0

I confirm the Service Quotas for total concurrency of Serverless Inference also dropped from 200 to 10!!! Serverless Quoutas

hdvvip
con risposta 2 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande