Skip to content

Bedrock citations span indices inconsistent during streaming (Agents / RetrieveAndGenerate)

0

We are seeing incorrect citation span indices citation span indices when using Amazon Bedrock Knowledge Base citations with both:

invoke_agent and RetrieveAndGenerate

Our application streams the response and uses the returned span to insert citation hyperlinks. Example citation object:

{
  "generatedResponsePart": {
    "textResponsePart": {
      "span": {
        "start": 0,
        "end": 248
      }
    }
  }
}

Example event order:

chunk(bytes)        → "... dashboar"
chunk(attribution)  → span(start=0,end=248)
chunk(bytes)        → "d displays approval requests..."

So, When the UI applies the span index, the citation is inserted incorrectly: "... dashboar[1]d displays approval requests..."

In other cases, we have observed:

  • Hyperlinks not coming for some questions
  • Citation misalignments are present in some questions
  • Citations coming at the very end
  • End index is wrong (beyond text length)

Question Is this expected behavior for Bedrock Knowledge Base citations?

Specifically: Are citation spans guaranteed to align exactly with the generated answer text returned in output.text? Can spans differ from the visible answer text due to internal formatting (e.g., hidden characters or newline normalization)? Is there a recommended approach for reliably mapping citation spans to UI hyperlinks? Any guidance would be appreciated.

1 Answer
0

Based on the Amazon Bedrock API documentation, citation spans are designed to indicate where text with a citation begins and ends in the generated output. The span object contains start and end integer fields that mark these positions in the generated text.

However, the behavior you're experiencing with streaming responses reveals an important consideration: citation spans refer to positions in the complete generated text, not necessarily aligned with the streaming chunk boundaries. This is why you're seeing the citation attribution arrive in the middle of a word ("dashboar" + citation + "d"), causing misalignment when applied immediately.

Key Points:

  1. Citation spans are absolute positions: The start and end indices in the span refer to character positions in the final, complete generated text - not relative to individual streaming chunks.

  2. Streaming timing issues: During streaming, citation metadata can arrive at any point in the stream, potentially before, during, or after the actual text it references has been transmitted. This is what's causing your insertion point problems.

Recommended Approach for Streaming with Citations:

  • Buffer the complete response: Collect all streaming chunks and citation metadata before applying the citations to the final text
  • Apply citations post-streaming: Once you have the complete generated text and all citation metadata, use the span indices to insert your hyperlinks at the correct positions
  • Validate span boundaries: Before applying citations, verify that the end index doesn't exceed the actual text length, and ensure spans align with word boundaries for better user experience

The citation spans should align with the complete generated text in output.text, but attempting to apply them in real-time during streaming will result in the misalignments you're observing. The safest approach is to treat citation application as a post-processing step after the full response has been received.

In case you need further assistance or would like to get specific recommendations for your use case, I request you to reach out to AWS Support and raise a case for the same via the below mentioned link:

[+] Creating support cases and case management - https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case

Sources
TextResponsePart - Amazon Bedrock
Span - Amazon Bedrock
retrieve_and_generate - Boto3 1.42.50 documentation
retrieve_and_generate - botocore 1.42.50 documentation

answered 2 months ago
AWS
SUPPORT ENGINEER
reviewed 2 months ago
AWS
SUPPORT ENGINEER
revised 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.