Can somebody explain Bedrock Provisioned Throughput?

0

Hi. The on-demand model of Claude Instant limits 400 requests per minute and 300,000 tokens per minute. I understand that in production you might need higher rates if you have some customers to avoid throttling. If i go to Provisioned Throughput i can add for example 1 model unit.

  • What does a model unit exactly mean?
  • How many tokens per minute is each model unit capable of processing? Can i modify the quotas of each model unit?
  • I guess if i purchase 1 model unit for a month i cannot cancel it until month ends. For a Claude Instant model unit the estimad costs are: 'Estimated hourly cost $39.60. Estimated daily cost $950.40. Estimated monthly cost $28,908.00'. What do these costs depend on? even if i do not use it will i be billed?
  • When would i need more than 1 model unit?

I feel like the documentation about this topic is not clear at all and as we are talking about big numbers its not something that can be 'play with'.

Thanks!

  • I just found a section which says 'For more information about what an MU specifies, contact your AWS account manager'. It also says that provisioned throughput quotas are adjustable so it is not very clear, thats why i asked when would i need to purchase 2 MU if quotas are adjustable.

1 Answer
3

Bedrock Provisioned Throughput allows you to reserve a certain amount of processing power for your application to avoid throttling and ensure consistent performance.

What does a model unit exactly mean?

Think of a model unit as a "portion" of computational resources dedicated to your application.

How many tokens per minute is each model unit capable of processing?

Each model unit can handle a certain amount of work, like processing a number of requests or tokens per minute. It's best to refer to the documentation or specifications provided by the service you are using to get the exact details for the model unit you are interested in.

Can i modify the quotas of each model unit?

You can't manually change how much work a model unit can handle. If you need more capacity, you'll need to add more model units.

What do these costs depend on? even if i do not use it will i be billed?

The costs for model units depend on the resources allocated to them. Even if you don't use the full capacity of a model unit, you'll still be charged based on the allocation.

Once you purchase a model unit for a month, you can't cancel it until the month ends.

When would i need more than 1 model unit?

You might need more than one model unit if your application needs to handle a higher workload than what a single unit can manage. This could be because of more requests or more complex processing needs.

profile picture
EXPERT
answered a month ago
profile picture
EXPERT
reviewed a month ago
  • Thanks! so it is like launching an instance for the model of your choose. I do not see anywhere specified the computational resources or the tokens per minute of any model on any part of the AWS documentation. I just found a section which says 'For more information about what an MU specifies, contact your AWS account manager'. It also says that provisioned throughput quotas are adjustable so it is not very clear, thats why i asked when would i need to purchase 2 MU if quotas are adjustable.

  • To better understand the costs, you can utilize Amazon's cost estimator at https://calculator.aws/#/createCalculator/bedrock. This tool provides detailed insights into the pricing granularity for the features you intend to use. However, please note that some models may not be available in the estimator yet. For such cases, refer to the pricing details at https://aws.amazon.com/bedrock/pricing/ and the runtime quotas listed at https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html#quotas-runtime.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions