내용으로 건너뛰기

Issue streaming response from bedrock agent

0

I would like to stream a response from a bedrock agent to the user. I'm working in python with boto3 and AgentsforBedrockRuntime. Under the hood, agents use InvokeAgent API, which is not build for streaming. Are there any temporary solutions to solve this issue? Is the bedrock team considering implementing this in the near future? Is there a way to see the roadmap for bedrock?

I think this post (not mine) articulates the issue well.

Thanks in advance!

질문됨 일 년 전4.3천회 조회
2개 답변
0

You could consider using AWS Lambda’s response payload streaming feature, which allows functions to progressively stream response payloads back to clients. This can be particularly useful when working with AI models that support streaming. If you’re working with Python, you might need to create a custom runtime for AWS Lambda, as response streaming is not directly supported on Python runtime.

Here is the doc to the API https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_InvokeAgent.html

전문가
답변함 일 년 전
  • I don't think adding a layer in front of Bedrock that receives the whole response and then streams it out would really help - it only adds latency if the layer doesn't already exist, and if a Lambda is already present in the architecture the Bedrock should really be consuming the clear majority of the overall latency - wouldn't expect the transmission from Lambda to client to be significant.

0

An important thing to consider about the agent pattern is that the final response is typically a product of multiple LLM calls, often chained together so the output of one was used in the input to the next.

This substantially reduces the value of response streaming for agents versus the plain LLM-calling ConverseStream and InvokeModelWithResponseStream APIs: Because only the last generation in the agent flow can be meaningfully streamed so the client is still waiting with no output through the intermediate steps.

I can't really comment on roadmap or timelines, but for potential alternatives with the current API I'd suggest maybe:

  • Testing faster models or optimizing or removing prompt steps in the agent to try and optimize response latency subject to your quality requirements (an automated testing framework like AWSLabs agent-evaluation might help you test these optimizations against a suite of example conversations)
  • Making more basic, UI-side changes to your application to reassure users that the model is working on a response: Like typing/thinking bubble, progress wheel, disabling the send button, etc...

Again, even if/when a streaming feature becomes available in this API I'd avoid assuming it'll be a massive change in perceived latency for your users - unless your agent is often outputting very long messages where even streaming the final generation in the chain would help.

AWS
전문가
답변함 일 년 전
전문가
검토됨 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

관련 콘텐츠