By using AWS re:Post, you agree to the Terms of Use

AWS Polly Speech Generation: Is it possible to produce speech plus speech marks in one call?


I need both speech output (as .MP3) and speechMarks output (as .JSON). Currently, I'm using two calls from the CLI based on the documentation, one call to generate each one for the same text.

I believe that means I'm getting billed twice for the same text. Is this correct? Are you billed twice, once for audio, and a second time for speech marks?

Also, this takes two calls / more time / seems to duplicate effort / the server must be doing the same computation twice.

Is there a way to make a single call that generates both speech audio output (mp3) and SpeechMarks (json) in a single call, and/or a way to pay once rather than twice for the same text?

Related question / Similar issue: I also need multiple speech variants for the same text to allow for end-user preferences (eg different voices, different speed/prosody). Is there a way to batch generate multiple sets of speech output from a single call to decrease speech generation cost for this situation, rather than paying for 2x the amount of text for each small variant?

Would prefer to do this using the CLI, but also fine to use tasks, the js API, the python API, etc.

Here are the docs with examples of the calls to generate audio and SpeechMarks:

Thanks for your help

asked 12 days ago22 views
1 Answer


It's not possible to call both speech audio output (mp3) and SpeechMarks (json) in a single API call. One API call can only provide audio output (or) speech marks output. Also, it's currently not possible to pass different outputs for input parameters in same API call. For different speech variants outputs, different API calls needs to be initiated.

answered 8 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions