Skip to content

Assistance Required for Resolving ThrottlingException Errors with Bedrock API

0

Hi Team,

I hope this message finds you well. I am reaching out to seek assistance regarding an issue we are experiencing with the Bedrock API in our AWS Lambda function. Despite implementing retry logic and exponential backoff mechanisms, we continue to encounter ThrottlingException errors.

Issue Details: Service: AWS Bedrock API Function: Lambda function invoking the Bedrock API Error: ThrottlingException - Too many requests, please wait before trying again. (reached max retries: 4) Retry Logic: We have implemented exponential backoff with up to 5 retries, capping the maximum backoff time to 60 seconds. Concurrency: Using ThreadPoolExecutor with a configurable number of worker threads for concurrent API requests.

Example Error Log: json Copy code { "statusCode": 500, "body": "{"error": "An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again. You have sent too many requests. Wait before trying again."}" }

Steps Taken: Increased Retry Attempts: Increased the retry attempts from 3 to 5. Adjusted Backoff Strategy: Implemented exponential backoff, capping the maximum wait time to 60 seconds. Monitored Concurrency: Reduced the number of concurrent requests to minimize the rate of API calls. Despite these measures, we continue to experience throttling issues, which impact the performance and reliability of our application. We would like to understand the following:

Rate Limits: What are the specific rate limits for the Bedrock API, and are there any quotas that we should be aware of? Best Practices: Are there any recommended best practices or configurations to handle a high volume of requests more effectively? Request for Increased Limits: Is it possible to request an increase in the rate limits or any specific quotas associated with our account? We are eager to resolve this issue promptly and would greatly appreciate any guidance or assistance you can provide. If additional information or access to our logs is required, please let us know, and we will be happy to provide it.

Thank you for your support and assistance.

1 Answer
1
Accepted Answer

Hi,

The runtime quotas for queries in terms of queries / token throughput are not adjustable (as of now): see https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html

It's for example 1'000 per min for Claude Haiku. The way to go around it if your budget allows is to go to provisioned througput: see Provisioned Quotas at bottom of same page.

Your problem may be a snowball effect: too many initial requests and retries in parallel will just generate more throttling exceptions and bad user experience.

What can try and see if it helps in your specific use case is to create a request manager: you can create a context via Redis where each lambda kaving a query under way stores its id up to a maximum that you control. When a new request comes in, it checks in the context if maximum is not reached. If not, it requests to the LLM. If max is reached, the Lambda polls the context until another Lambda removes its entry because its request completed.

In that way, you will incrementally learn the maximum parallelism allowed for your account and region to minimize throttling exceptions and their cascading effect. It will not increase your allowed throughput but at least ensure that you make best use of it.

There is NO guarantee that this mechanism will work in your specific use case. It just helped in one of mine, that's why I suggest it here for a trial on your side.

Best,

Didier

EXPERT
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.