Browse through the questions and answers listed below or filter and sort to narrow down your results.
AWS Polly Speech Generation: Is it possible to produce speech plus speech marks in one call?
I need both speech output (as .MP3) and speechMarks output (as .JSON). Currently, I'm using two calls from the CLI based on the documentation, one call to generate each one for the same text. I believe that means I'm getting billed twice for the same text. Is this correct? Are you billed twice, once for audio, and a second time for speech marks? Also, this takes two calls / more time / seems to duplicate effort / the server must be doing the same computation twice. Is there a way to make a single call that generates both speech audio output (mp3) and SpeechMarks (json) in a single call, and/or a way to pay once rather than twice for the same text? Related question / Similar issue: I also need multiple speech variants for the same text to allow for end-user preferences (eg different voices, different speed/prosody). Is there a way to batch generate multiple sets of speech output from a single call to decrease speech generation cost for this situation, rather than paying for 2x the amount of text for each small variant? Would prefer to do this using the CLI, but also fine to use tasks, the js API, the python API, etc. Here are the docs with examples of the calls to generate audio and SpeechMarks: https://docs.amazonaws.cn/en_us/polly/latest/dg/using-speechmarks.html https://docs.amazonaws.cn/en_us/polly/latest/dg/get-started-cli-exercise.html Thanks for your help
How do I properly format a lexicon for proper pronunciation? (Beginner)
Hello, Trying to format a proper Lexicon pronunciation for a word. Tried transferring what was explained in AWS Polly beginner video to no avail; source: https://www.youtube.com/watch?v=B2XSU22ilmQ The word I'm trying to translate is below. The voice I am using is en-GB (Amy), however the pronunciation sounds are strictly from the perspective of: en-US. * Word/Term: Mans Milias, * Native Spelling: Mans Mīļais, * Native Lang: Latvian, * Pronunciation: Maans.mEE.yais My script: ~~~~<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>Mans Milias</grapheme> <grapheme>mans milias</grapheme> <grapheme>MANS MILIAS</grapheme> <phoneme>"Maans.mEE.yais</phoneme> </lexeme> </lexicon> Thanks for any assistance! Advice welcomed! Brand new user to this software. V
Polly Abbreviation Pronunciation
Polly pronounces "can't" as "can" + "back slash". How can I get an abbreviation like 't to be pronounced properly? 'OutputFormat' => 'mp3', 'Text' => "<speak><prosody rate='medium' volume='x-loud'>I can't do it</prosody></speak>", 'TextType' => 'ssml', 'LanguageCode' => 'en-US', 'VoiceId' => "Joanna",
Polly- Mp3 Quality
Hi, I'm just starting to test Polly out. So I typed some sample text and selected a voice, Matthew, and it sounded fine; just what I'm looking for. However, once I downloaded the mp3 file of the text being read (both neural and standard voice) the quality was not good. Both versions had massive amounts of static during the whole clip and only came out of one side of my headphones. Am I doing something wrong? Thanks for your help.
Amazon Polly changes intonation during playing audio.
Hello, Amazon Polly changes intonation during playing audio. Step to reproduce: 1. Open https://us-west-2.console.aws.amazon.com/polly/home/SynthesizeSpeech?region=us-west-2# 2. Choose voice Matthew, Male 3. Turn on SSML 4. Try to play this: ``` <speak> "Hold it up to the light."I did so, and saw a large "E" with a small "g," a "P," and a large "G" with a small "t" woven into the texture of the paper. "What do you make of that?" </speak> ``` You can hear that second sentence playing with different intonation. This issue is related to each voice from the list of voices. If you remove from the second sentence `"g," a "P,"` then intonation does not change. Thanks.
Text to Speech Cognitive Service
Hello Team, I have a query in regards to the [Text to Speech Service](https://www.speakatoo.com/) that we offer on our website [https://www.speakatoo.com/ ](https://www.speakatoo.com/) We would like to add AWS TTS services too on our website and have few questions on SSML tags. I see that we have a few special tags for advanced effects that we can put across for AWS standard voices like below. **<amazon:effect name="whispered">**hello , how are you ?**</amazon:effect>** Is there a way that we can control these tags at our end as it is quite complex to put every time. Regards, Nish
An error occurred (InvalidSsmlException) when calling the StartSpeechSynthesisTask operation: Invalid SSML request
Why does the response in case of an error omit any details like which bits of the provided SSML is invalid? I'm currently facing a weird situation: When I try to start a StartSpeechSynthesisTask from a python application, using boto3, I get an InvalidSsmlException. But when I try the same input in the AWS Polly > Text-to-Speech console, it works flawlessly. Bet if the Exception included more info I would find the answer on my own
Converting long text to audio with Polly using S3 systhesis - failure
Hi, I am trying to convert a book in a word doc to audio using Polly. I paste chunks of the book under 100K characters into Polly and then hit save to S3. It schedules and then runs the task but then after a few minutes the S3 Synthesis task just says "failed". What am I doing wrong? Perhaps something with naming/creation/location of an S3 output bucket? I don't even know what that is, I am just writing in a name. Do I need to pre-create a bucket somewhere? I am sure this is a totally basic question but thanks in advance for the help!