ThrottlingExceptions while using on-demand Bedrock runtime for invoking Claude v2.1

0

Hi! we are currently using Bedrock runtime for invoking the Claude v2.1 model on-demand. During our testing, we noticed Throttling exceptions being thrown by Bedrock indicating too many requests were performed:

java.util.concurrent.ExecutionException: software.amazon.awssdk.services.bedrockruntime.model.ThrottlingException: Too many requests, please wait before trying again. You have sent too many requests.  Wait before trying again. (Service: BedrockRuntime, Status Code: 429, Request ID: )

We then cross-checked with the quota guide and there we see that for Claude v2.1 the quotas are 100 req/min and 200000 tokens/min. We did not even come close to these limits. We barely scratched 1 request per minute in that timeframe. Refer to Cloudwatch screenshot: Bedrock Cloudwatch

So my questions regarding this:

  1. Why did we get these ThrottllingExceptions?
  2. Can these ThrottlingExceptions occur even when individual quota limits are not reached just because throughput is shared across all customers?
  3. Is switching to provisioned throughput the only option to mitigate this issue?

Thanks for your time!

Osama
질문됨 2달 전1152회 조회
1개 답변
1
수락된 답변

Hello, I understand that you are seeing ThrottlingException for invoking Claude v2.1 model in on-demand mode, even through the requests are lesser than the quota limit as per the documentation. Let me address each of your queries below-

  1. Why did we get these ThrottllingExceptions?

In case of on-demand mode, a shared capacity pool will be used across multiple customers. So at times when the demand is high and base model is processing a large number of requests, there will be a possibility of throttling even though you may have the necessary limits in place.

  1. Can these ThrottlingExceptions occur even when individual quota limits are not reached just because throughput is shared across all customers?

Yes, please note that since on-demand models make use of a shared capacity pool, during periods of high demand across the service, individual accounts may be throttled below their expected rates. Kindly note that the internal team is working on long-term fixes to expand capacity and address this issue, but we currently do not have an ETA in place.

  1. Is switching to provisioned throughput the only option to mitigate this issue?

You can also try using retry mechanisms/ exponential backoffs to mitigate throttling. That being said, it is suggested to consider provisioned throughput as it provides reserved capacity for your account specifically, so you can avoid the inherent peaks and valleys of on-demand and maintain a consistent level of performance [1].

I hope you found this helpful. If you face any other issues or require further assistance, please reach out to AWS Support [2] along with your use case details, and we would be happy to assist you further. Thank you!

References:

[1] https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html

[2] https://console.aws.amazon.com/support/home#/case/create

AWS
답변함 2달 전
profile picture
전문가
검토됨 2달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠