跳至内容

Clarification on Prompt Caching Usage with converse API in Amazon Bedrock

0

Hi AWS Team,

I was reading your blog post on Effectively Using Prompt Caching on Amazon Bedrock, and I noticed that some of the examples reference caching based on system, messages, and tool_choice.

I’d like to confirm whether the following usage pattern is compatible with prompt caching:

response = bedrock_client.converse(
    modelId=model_id,
    messages=messages,
    system=[{"text": system_text}],
    inferenceConfig={
        "maxTokens": 1000,
        "temperature": 0,
        "topP": 0.0
    }
)

Does this invocation format correctly trigger the prompt caching mechanism?

Are there any additional parameters or configurations needed to ensure the cache is utilized?

Is there a way to verify whether a cache hit occurred for a given request? we are trying to prompt caching with ( sonnet3.5/3.7 and nova pro/lite/micro)

Thanks in advance for your help!

已提问 8 个月前1839 查看次数
1 回答
1

Thank you for your question about prompt caching with the Amazon Bedrock Converse API. I'd be happy to clarify this for you.

The invocation format you've provided is compatible with prompt caching, but it doesn't explicitly set up cache checkpoints. To fully leverage prompt caching, you'll want to add cache checkpoints to your request.

For the Converse API, you can set cache checkpoints in the messages, system, or tools fields. Here's how you could modify your example to include a cache checkpoint in the system field:

response = bedrock_client.converse(
    modelId=model_id,
    messages=messages,
    system=[
        {"text": system_text},
        {"cachePoint": {"type": "default"}}
    ],
    inferenceConfig={
        "maxTokens": 1000,
        "temperature": 0,
        "topP": 0.0
    }
)

By adding the cachePoint after your system text, you're indicating that the system prompt should be cached.

To verify whether a cache hit occurred for a given request, you can check the usage section in the API response. This section provides cache performance data, including cache hit rates and token consumption for both read and write operations.

Prompt caching is supported for the Claude models you mentioned (Sonnet 3.5, Sonnet 3.7, Nova Pro, Nova Lite, and Nova Micro), so you should be able to use this feature with these models.

Remember that the cache has a five-minute Time To Live (TTL), so identical requests within this timeframe are more likely to result in cache hits. Also, cached tokens are charged at a reduced rate compared to new tokens.

To make the most of prompt caching, consider structuring your prompts with static portions that can be reused across multiple requests. This approach can lead to significant improvements in latency and cost efficiency for your application.

If you want to monitor your cache performance over time, you can use Amazon CloudWatch metrics associated with Amazon Bedrock to track cache-related metrics across your applications.

Sources
Prompt caching for faster model inference - Amazon Bedrock
Effectively use prompt caching on Amazon Bedrock | AWS Machine Learning Blog
Does usage of prompt caching impact utilization of the tokens-per-minute service limit? | AWS re:Post

已回答 8 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。