1 Answer
- Newest
- Most votes
- Most comments
3
As far as I understand your case, the "XML Leak" occurs when the model's internal tool-calling logic is streamed into the output buffer before the orchestration layer can intercept it. For voice-based agents, this is critical because the TTS engine processes these tokens immediately.
I would try the following:
- Try to switch to the Bedrock Converse API: Instead of manual XML prompting, use the native toolConfig in the Converse API. It strictly separates toolUse blocks from the message content, preventing the audio engine from seeing the XML tags.
- Define Stop Sequences: Add
<or<__(two_) as Stop Sequences in your inference configuration. This forces the model to stop generating the user-facing response the moment it attempts to invoke a tool. - System Prompt Enforcement: Add a directive: "Internal tool calls (XML) must never be part of the verbal response. Use tools silently."
- Audio Latency Check: Ensure your TTS generation is only triggered by the content block and ignores any tool_use blocks provided by the model response.
Relevant content
- asked 5 months ago
- asked 4 months ago
