StartSpeechSynthesisTask is too slow

1

Looks like audio is available on S3 only after been fully synthesized. So I will continue to use my text chunking code in order to get faster playing. Amazon, is it not possible to stream incomplete audio?

demandé il y a 5 ans327 vues
2 réponses
0

Hi,
I wanted to use audio files synthesized by Polly with my Alexa skill, but had to stop that activity.
Alexa takes a session during 8 seconds and then stops responding, but the StartSpeechSynthesisTask takes at least 20-30 seconds even for short sentences - 60 - 80 characters.
I found in the Alexa Skills development documentation, that the described scenario is not allowed by AWS due to security reasons. Audio files that are used with SSML <audio /> element, must be in public access and no authentication is required: https://developer.amazon.com/en-US/docs/alexa/custom-skills/speech-synthesis-markup-language-ssml-reference.html#audio

I'm interested in if there are any ways to reduce the gap between the StartSpeechSynthesisTask and the moment when S3 object is fully generated? Is it because of any guards/checks/restrictions for the scenario on AWS side, or due to the nature of Polly and S3 services?

Thanks in advance!
Denis

répondu il y a 3 ans
  • I'd also be interested in possible ways to reduce the waiting time. From my experience though, I'll have to wait for (quite exactly) 15 seconds until the file becomes available via S3 (the (then future) file name is immediately returned by the StartSpeechSynthesisTask command) and not an arbitrary amount of seconds.

0

Hi zdanevich-vitaly-andreevich,

Thanks for contacting us!

By design audio file produced by StartSpeechSynthesisTask is available for download once it's finished.
Generally Amazon S3 never adds partial objects, so it is not possible for an object to appear in S3 if it's not completely uploaded.
If you requirement is low latency then I recommend using SynthesizeSpeech operation.

On the other hand if you're synthesizing very long texts and you still care about latency you can combine those two in the way that you synthesize first chunk using SynthesizeSpeech and the rest using StartSpeechSynthesisTask.

Thanks,
Hubert

répondu il y a 5 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions