Skip to content

Clarification on Prompt Caching Usage with converse API in Amazon Bedrock

0

Hi AWS Team,

I was reading your blog post on Effectively Using Prompt Caching on Amazon Bedrock, and I noticed that some of the examples reference caching based on system, messages, and tool_choice.

I’d like to confirm whether the following usage pattern is compatible with prompt caching:

response = bedrock_client.converse(
    modelId=model_id,
    messages=messages,
    system=[{"text": system_text}],
    inferenceConfig={
        "maxTokens": 1000,
        "temperature": 0,
        "topP": 0.0
    }
)

Does this invocation format correctly trigger the prompt caching mechanism?

Are there any additional parameters or configurations needed to ensure the cache is utilized?

Is there a way to verify whether a cache hit occurred for a given request? we are trying to prompt caching with ( sonnet3.5/3.7 and nova pro/lite/micro)

Thanks in advance for your help!

1 Answer
1

Thank you for your question about prompt caching with the Amazon Bedrock Converse API. I'd be happy to clarify this for you.

The invocation format you've provided is compatible with prompt caching, but it doesn't explicitly set up cache checkpoints. To fully leverage prompt caching, you'll want to add cache checkpoints to your request.

For the Converse API, you can set cache checkpoints in the messages, system, or tools fields. Here's how you could modify your example to include a cache checkpoint in the system field:

response = bedrock_client.converse(
    modelId=model_id,
    messages=messages,
    system=[
        {"text": system_text},
        {"cachePoint": {"type": "default"}}
    ],
    inferenceConfig={
        "maxTokens": 1000,
        "temperature": 0,
        "topP": 0.0
    }
)

By adding the cachePoint after your system text, you're indicating that the system prompt should be cached.

To verify whether a cache hit occurred for a given request, you can check the usage section in the API response. This section provides cache performance data, including cache hit rates and token consumption for both read and write operations.

Prompt caching is supported for the Claude models you mentioned (Sonnet 3.5, Sonnet 3.7, Nova Pro, Nova Lite, and Nova Micro), so you should be able to use this feature with these models.

Remember that the cache has a five-minute Time To Live (TTL), so identical requests within this timeframe are more likely to result in cache hits. Also, cached tokens are charged at a reduced rate compared to new tokens.

To make the most of prompt caching, consider structuring your prompts with static portions that can be reused across multiple requests. This approach can lead to significant improvements in latency and cost efficiency for your application.

If you want to monitor your cache performance over time, you can use Amazon CloudWatch metrics associated with Amazon Bedrock to track cache-related metrics across your applications.

Sources
Prompt caching for faster model inference - Amazon Bedrock
Effectively use prompt caching on Amazon Bedrock | AWS Machine Learning Blog
Does usage of prompt caching impact utilization of the tokens-per-minute service limit? | AWS re:Post

answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.