By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon Polly

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

2
answers
0
votes
21
views
asked 2 years ago

<break> Inconsistencies?

It seems that there is some inconsistencies in how the Amazon Polly Console (and files processed through the CLI or API) is handling non-coded pauses when using <break> or where the </speak> is placed in the file. Here are some examples to illustrate the problem - it doesn't matter if a neural voice or standard voice is used. Example 1: ``` <speak>Article 1 Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document. </speak >``` A pause is inserted after Article 1 since there is a paragraph break which is the expected result. Example 2: ``` <speak>Article 1 Video provides a powerful way to help you prove your point. When you click Online Video <break time = ".3s"/> you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document. </speak >``` A <break> is inserted in place of the comma. When this occurs, the paragraph break after Article 1 is ignored so there is no pause between Article 1 and Video. This is unexpected. Example 3: ``` <speak>Article 1 Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document.</speak >``` Similar to Example 1, a pause **should ** occur after Article 1 since there is a paragraph break. This does not occur. Notice that there is no line break before </speak>. Any clarification would be most helpful as results are inconsistent now and it would be time consuming to have to insert `<p></p>` throughout to get expected results. Edited by: vabtm on Jul 3, 2020 1:57 PM
3
answers
0
votes
14
views
asked 2 years ago

Inconsistent results with role tag

I'm trying to make Polly say 'live' (as in 'Live from New York!') using the role tag. It works fine when live is at the beginning of the sentence. But if the word is buried in the middle of the sentence, the pronunciation changes. I'm using the Windows CIL interface and Neural voices. My region is set to us-east-1. I've also tried various workarounds using the phoneme tag, but I can't get anything to work. I think it's because I need to escape characters in the ph= option (like stress marks in ipa, and <? in x-samp), but \ doesn't seem to be working as an escape in CIL. AWS keeps returning "Invalid SSML Request" when i attempt to escape anything. Here are the role commands I'm sending: ``` aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'><w role='amazon:SENSE_0'>Live</w> from New York, it's Saturday Night!<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml good.mp3 aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <w role='amazon:SENSE_0'>live</w> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml bad.mp3 ``` And here are the various phonem tags I've tried: ``` aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='ipa' ph='līv'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live1.mp3 aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='ipa' ph='laɪv'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live2.mp3 aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='ipa' ph='lɪv'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live3.mp3 aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='ipa' ph='\ˈlɪv'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live4.mp3 aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='ipa' ph='"lɪv'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live5.mp3 aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='x-sampa' ph='"lIv'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live6.mp3 aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='x-sampa' ph='l<? ī ?>v'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live7.mp3 ``` Edited by: sharonhuston on Apr 7, 2020 10:11 AM More in this issue: I'm also getting inconsistent results between the web console and the CIL. Look at these two identical entries. The console says "live" correctly, the command line does not. link:<https://learning.realpage.com/downloads/sharon/cil.png> link:<https://learning.realpage.com/downloads/sharon/console.png> Edited by: sharonhuston on Apr 7, 2020 1:29 PM
2
answers
0
votes
18
views
asked 3 years ago