Skip to content

Amazon Nova Sonic 2 — Speaks Special Characters in Voice Output During Bidirectional Streaming

0

We are using Amazon Nova Sonic 2 with bidirectional streaming via InvokeModelWithBidirectionalStream, along with multiple configured tools. A large portion of our content is in Indian languages, so the responses sometimes include Unicode or formatting-related special characters. Since Nova Sonic 2 is a speech-to-speech model, we are observing that it occasionally speaks out special characters such as: Asterisk (*) Bullet points Other formatting symbols

Instead of ignoring or naturally handling them in speech output. We tried mitigating this through the system prompt by instructing the model not to speak such characters, but the issue still occurs intermittently. Has anyone else faced this issue? Are there any recommended best practices, output sanitisation methods, or model settings to prevent special characters from being spoken in voice responses?

  • If my answer was helpful, I would appreciate it if you could mark it as the accepted answer.

asked 17 days ago48 views
2 Answers
1

“Correction“ on Speech-to-Speech (S2S) Architecture:

Since Nova Sonic 2 generates audio natively within the bidirectional stream, the suggestion to "sanitize text before conversion" is difficult to implement—the model is likely synthesizing the audio for these characters simultaneously with the text.

To fix this, focus on the input and generation style:

  • Sanitize Tool Inputs: If your tools return Markdown (e.g., text or * bullet), strip these symbols before passing the tool results back to the model.
  • Enforce "Verbal-Only" Output: Update your system prompt to explicitly forbid Markdown. Use: "Generate responses for speech only. Do not use Markdown, asterisks, or bullet points. Use natural verbal transitions instead."
  • Unicode Normalization: For Indian languages, ensure your input text/tool results are normalized (NFC) to prevent the model from misinterpreting dangling Unicode modifiers as symbols to be spoken.

If the model still speaks "asterisk," it is treating the Markdown as literal transcript; your only control point is the text you feed into the model's context via tool results or prompts.

EXPERT
answered 17 days ago
0

Amazon Nova Sonic is designed to provide natural, human-like conversational AI experiences with contextual richness across supported languages, including multilingual support with expressive voices. The model uses a bidirectional streaming architecture where audio input and model responses flow simultaneously over an open stream.

While the available documentation covers the core capabilities of Amazon Nova Sonic, including its adaptive speech response that dynamically adjusts delivery based on input speech prosody and its support for multilingual interactions, there isn't specific guidance provided regarding the handling of special characters or formatting symbols in voice output during bidirectional streaming sessions.

The bidirectional streaming API architecture involves structured JSON events that control session lifecycle, audio streaming, and text responses. The model processes input and streams responses asynchronously, including automatic speech recognition results, tool use invocations, text responses, and audio output for playback.

For your specific use case involving Indian languages with Unicode or formatting characters, you may want to consider implementing output sanitization at the application layer before the text is converted to speech. This could involve preprocessing the model's text responses to remove or replace special characters like asterisks, bullet points, and other formatting symbols before they reach the audio generation stage. Since you're using multiple configured tools with the bidirectional streaming API, you could potentially add a filtering step in your response handler that processes the text chunks before they're converted to audio output.

Given that system prompt instructions haven't fully resolved the issue, application-level text preprocessing may be a more reliable approach to ensure consistent handling of these characters across all responses.
Sources
Using the Amazon Nova Sonic Speech-to-Speech model - Amazon Nova
Make your web apps hands-free with Amazon Nova Sonic | Artificial Intelligence
Using the Bidirectional Streaming API - Amazon Nova

answered 17 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.