By using AWS re:Post, you agree to the Terms of Use

Getting User Utterance as Text in Amazon Sumerian


I have a Sumerian Host that acts basically as a front-end for a Lex chatbot.

However, in some cases, I need to to some processing based on the actual user utterance (that is, the text of what the user says). Is there a way I can use "Send Audio to Lex" action (or a different one) to get a text version of the user audio (so, to perform speech-to-text)?

asked 9 months ago46 views
1 Answer
Accepted Answer

Hello Maxi,

If you are looking for only transcribing a user input, Amazon Transcribe would be a better fit. However, if you are looking for doing some processing of the user input, in context of the bot, you can hook a Lambda function in a bot, and use "inputTranscript" field to get the text of what the user said.


answered 9 months ago
  • Maxi, do you want to do the additional input text processing on the client side (inside of Sumerian using JavaScript) or on the server side? If you want the processing to happen on the server side, then the answer swapandeepataws provided is good. However, if your goal is to do some processing on the client side directly in Sumerian, that's possible, too. Let me know if that's your objective. I'd be happy to provide some sample code.

  • Also, a clarification on swapandeepataws's answer. Amazon Sumerian doesn't offer direct integration of Amazon Transcribe as it does for Lex and Polly. If you want to integrate Transcribe with Sumerian you can, but it requires linking the AWS JavaScript SDK into your Sumerian project and writing custom JavaScript code to interact with that SDK.

  • Thanks Kris, saw your comments too late, but as you can see it is solved.

  • Hi! I added the lambda and used the "inputTranscript" as you suggested and it works fine.

    JFYI, as I want to always have a transcript on Sumerian side, I created a bot with a "fake" intent that has only one utterance that is never matched. I added a AWS.FallbackIntent which calls my lambda function. The Lambda simply retrieves "inputTranscript" and returns in into a properly formatted Lex reply (

    The result is that, whatever the user says, the fall back intent is always activated, which calls the Lambda, which returns user utterance as message. I can then take the message and process it in any way I want.

    Thanks for the hint.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions