- Newest
- Most votes
- Most comments
To enable Nova Sonic to handle both voice and text inputs during the same conversation, you'll need to understand how the bidirectional Stream API works and modify your event flow accordingly.
Nova Sonic uses an event-driven architecture with structured input and output events. When you want to switch from voice to text input during an ongoing conversation, you need to follow the same event pattern but change the content type.
Here's how you can implement this functionality:
-
For voice input, you typically send audio frames as
audioInputevents within a content container (started withcontentStartand ended withcontentEnd). -
To switch to text input during the same conversation, you would:
- Close any open audio stream with a
contentEndevent (if you were previously using voice) - Send a new
contentStartevent with the role set toUSER - Send a
textInputevent containing the user's typed message - Send a
contentEndevent to close this text input segment
- You can then switch back to voice input later by following the same pattern but using
audioInputevents instead.
The key is maintaining the same conversation session while changing the input modality. Since Nova Sonic's architecture is designed to process different types of content within the same conversation flow, you can alternate between voice and text inputs as needed.
Make sure your application's UI provides clear options for users to switch between input modes, and that your backend properly handles the transition between these modes by sending the appropriate event sequences to Nova Sonic.
Sources
Handling input events with the bidirectional API - Amazon Nova
Using the Amazon Nova Sonic Speech-to-Speech model - Amazon Nova
Relevant content
- asked 2 months ago
- asked 5 months ago
- AWS OFFICIALUpdated 4 years ago
- AWS OFFICIALUpdated 2 years ago
