- Newest
- Most votes
- Most comments
Thank you for your question about using prompt caching with Amazon Bedrock's Nova Pro model. I'd be happy to provide some guidance on how to properly structure your input and system text for prompt caching, as well as how to verify that it's working as expected.
First, let's look at how to structure your input text and system text for prompt caching:
-
Input Text: Your current structure for the input text is correct. You've placed the cache checkpoint after the input text, which is the right approach. This allows the input text to be cached for potential reuse in future requests.
-
System Text: In your current code, you're not applying caching to the system text. To cache the system text, you should structure it similarly to how you've done the input text. Here's how you can modify your code to cache the system text:
system = [ {"text": system_text}, {"cachePoint": {"type": "default"}} ] response = bedrock_client.converse( modelId=model_id, messages=[message], system=system, inferenceConfig={ "maxTokens": 2000, "temperature": 0, "topP": 0.0 } )
This change will allow the system text to be cached as well.
To verify that prompt caching is working as expected, you can check the 'usage' section in the API response. This section provides cache performance data, including cache hit rates and token consumption for both read and write operations. Here's how you can modify your code to extract and print this information:
if 'usage' in response: usage = response['usage'] print(f"Cache Read Input Tokens: {usage.get('cacheReadInputTokens', 0)}") print(f"Cache Write Input Tokens: {usage.get('cacheWriteInputTokens', 0)}") print(f"Input Tokens: {usage.get('inputTokens', 0)}") print(f"Output Tokens: {usage.get('outputTokens', 0)}")
When prompt caching is working, you should see non-zero values for 'cacheReadInputTokens' in subsequent calls with the same input, indicating that tokens are being read from the cache.
Remember that the cache has a five-minute Time To Live (TTL), so identical requests within this timeframe are more likely to result in cache hits. Also, to fully benefit from caching, try to structure your prompts with static portions that can be reused across multiple requests.
Lastly, keep in mind that while cached tokens are charged at a reduced rate compared to new tokens, there may be additional costs for tokens written to the cache when using certain models.
If you're still not seeing the expected caching behavior after making these changes, you may want to double-check that prompt caching is enabled for your account and the specific model you're using, as it might be subject to preview or regional availability.
Sources
Clarification on Prompt Caching Usage with converse API in Amazon Bedrock | AWS re:Post
Prompt caching for faster model inference - Amazon Bedrock
Using the Converse API - Amazon Bedrock
Reduce costs and latency with Amazon Bedrock Intelligent Prompt Routing and prompt caching (preview) | AWS News Blog
Relevant content
- asked 4 months ago
- asked 4 months ago
- AWS OFFICIALUpdated a year ago