Inconsistent results with role tag

0

I'm trying to make Polly say 'live' (as in 'Live from New York!') using the role tag. It works fine when live is at the beginning of the sentence. But if the word is buried in the middle of the sentence, the pronunciation changes.

I'm using the Windows CIL interface and Neural voices. My region is set to us-east-1.

I've also tried various workarounds using the phoneme tag, but I can't get anything to work. I think it's because I need to escape characters in the ph= option (like stress marks in ipa, and <? in x-samp), but \ doesn't seem to be working as an escape in CIL. AWS keeps returning "Invalid SSML Request" when i attempt to escape anything.

Here are the role commands I'm sending:

aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'><w role='amazon:SENSE_0'>Live</w> from New York, it's Saturday Night!<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml good.mp3

aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <w role='amazon:SENSE_0'>live</w> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml bad.mp3

And here are the various phonem tags I've tried:

aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='ipa' ph='līv'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live1.mp3

aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='ipa' ph='laɪv'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live2.mp3

aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='ipa' ph='lɪv'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live3.mp3

aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='ipa' ph='\ˈlɪv'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live4.mp3

aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='ipa' ph='"lɪv'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live5.mp3

aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='x-sampa' ph='"lIv'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live6.mp3

aws polly synthesize-speech --output-format mp3 --engine neural --voice-id Joanna --text "<speak><amazon:domain name='conversational'><prosody rate='97%'>When sharing compatible files, select Request control. <break time='.35s'/>This allows you to <phoneme alphabet='x-sampa' ph='l<? ī ?>v'>live</phoneme> edit the open documents during your video session.<break time='.1s'/></prosody></amazon:domain></speak>" --text-type ssml live7.mp3

Edited by: sharonhuston on Apr 7, 2020 10:11 AM

More in this issue: I'm also getting inconsistent results between the web console and the CIL. Look at these two identical entries. The console says "live" correctly, the command line does not.

link:https://learning.realpage.com/downloads/sharon/cil.png
link:https://learning.realpage.com/downloads/sharon/console.png

Edited by: sharonhuston on Apr 7, 2020 1:29 PM

asked 4 years ago190 views
2 Answers
0
Accepted Answer

Hi,

To most easily get "live" (rhyming with "dive"), you can use:

<w role="amazon:NN">live</w>

And to get "live" (rhyming with "give"), you can use:

<w role="amazon:VB">live</w>

The role amazon:SENSE_0 won't work. The sense roles are numbered starting from amazon:SENSE_1. They are also typically used to disambiguate pronunciations that couldn't be done via part of speech, (e.g. "bass" as in the frequency range/guitar, or "bass" as in the fish). So the sense roles wouldn't help here.

Regarding the <phoneme> tag, to get the equivalent of <w role="amazon:NN">live</w> (rhyming with "dive"), the following would be the canonical ways to do so:

<phoneme alphabet="ipa" ph="ˈlaɪv">live</phoneme>

<phoneme alphabet="x-sampa" ph="&quot;laIv">live</phoneme>

For the equivalent of <w role="amazon:VB">live</w> (rhyming with "give"), the following would be the canonical ways to do so:

<phoneme alphabet="ipa" ph="ˈlɪv">live</phoneme>

<phoneme alphabet="x-sampa" ph="&quot;lIv">live</phoneme>

The IPA primary stress mark character ˈ does not need to be escaped because it is not the same as the single quote character '.

However, with X-SAMPA, the primary stress marker is the double quote, so it would have to be replaced in a double-quoted shell command using the equivalent XML entity: &quot;

For the inconsistency between the CLI and the console, thank you for reporting that. I was unable to reproduce it on my end. Despite this, the suggestions I provided above should adequately provide you the synthesis you're looking for.

Edited by: anton-at-aws on Apr 10, 2020 7:28 AM

answered 4 years ago
0

Thank you!! This is really helpful!! It is nice to know that amazon:NN exists. The Supported SSML Tags page in the documentation only lists VB and VBD.

Edited by: sharonhuston on Apr 10, 2020 9:17 AM

Edited by: sharonhuston on Apr 10, 2020 9:17 AM

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions