Does usage of prompt caching impact utilization of the tokens-per-minute service limit?

0

Does prompt caching impact our per minute token limit?

For example, suppose we had a limit of 100k tokens per minute and make 5 API calls with 20k in cache hits and 5k tokens of new data per prompt. We would use a total of 125k tokens, but 80k of them are cache-hits.

Will service limits impact our API calls in this scenario?

1 Answer
1

Prompt caching in Amazon Bedrock does not impact the tokens-per-minute service limit in the way you've described. When using prompt caching, you are only charged for and counted against your token limits for the new tokens processed, not for the cached tokens.

In your example scenario:

  • You have a limit of 100k tokens per minute
  • You make 5 API calls
  • Each call has 20k tokens in cache hits and 5k new tokens

In this case, you would only be using 25k tokens against your limit (5k new tokens * 5 calls), not 125k. The 20k cached tokens per call (100k total) do not count against your token limit.

So, in this scenario, you would be well within your 100k tokens per minute limit, using only 25k of it. The service limits would not impact your API calls in this case.

Prompt caching is designed to reduce both costs and latency by avoiding reprocessing of repeated content. Cached tokens are charged at a reduced rate and do not count towards your regular token limits. This allows you to make more efficient use of your allocated resources and potentially make more API calls within the same limits.

It's important to note that the exact implementation and impact may vary depending on the specific model and region you're using, as prompt caching features can differ. Always refer to the most current AWS documentation for the most accurate information regarding your specific use case.
Sources
Cache Prompts Between Requests - Amazon Bedrock Prompt Caching - AWS
Effectively use prompt caching on Amazon Bedrock | AWS Machine Learning Blog
Prompt caching for faster model inference - Amazon Bedrock
Reduce costs and latency with Amazon Bedrock Intelligent Prompt Routing and prompt caching (preview) | AWS News Blog

profile picture
answered a month ago
profile picture
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions