跳至內容

Issue streaming response from bedrock agent

0

I would like to stream a response from a bedrock agent to the user. I'm working in python with boto3 and AgentsforBedrockRuntime. Under the hood, agents use InvokeAgent API, which is not build for streaming. Are there any temporary solutions to solve this issue? Is the bedrock team considering implementing this in the near future? Is there a way to see the roadmap for bedrock?

I think this post (not mine) articulates the issue well.

Thanks in advance!

已提問 1 年前檢視次數 4275 次
2 個答案
0

You could consider using AWS Lambda’s response payload streaming feature, which allows functions to progressively stream response payloads back to clients. This can be particularly useful when working with AI models that support streaming. If you’re working with Python, you might need to create a custom runtime for AWS Lambda, as response streaming is not directly supported on Python runtime.

Here is the doc to the API https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_InvokeAgent.html

專家
已回答 1 年前
  • I don't think adding a layer in front of Bedrock that receives the whole response and then streams it out would really help - it only adds latency if the layer doesn't already exist, and if a Lambda is already present in the architecture the Bedrock should really be consuming the clear majority of the overall latency - wouldn't expect the transmission from Lambda to client to be significant.

0

An important thing to consider about the agent pattern is that the final response is typically a product of multiple LLM calls, often chained together so the output of one was used in the input to the next.

This substantially reduces the value of response streaming for agents versus the plain LLM-calling ConverseStream and InvokeModelWithResponseStream APIs: Because only the last generation in the agent flow can be meaningfully streamed so the client is still waiting with no output through the intermediate steps.

I can't really comment on roadmap or timelines, but for potential alternatives with the current API I'd suggest maybe:

  • Testing faster models or optimizing or removing prompt steps in the agent to try and optimize response latency subject to your quality requirements (an automated testing framework like AWSLabs agent-evaluation might help you test these optimizations against a suite of example conversations)
  • Making more basic, UI-side changes to your application to reassure users that the model is working on a response: Like typing/thinking bubble, progress wheel, disabling the send button, etc...

Again, even if/when a streaming feature becomes available in this API I'd avoid assuming it'll be a massive change in perceived latency for your users - unless your agent is often outputting very long messages where even streaming the final generation in the chain would help.

AWS
專家
已回答 1 年前
專家
已審閱 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。