StartSpeechSynthesisTask is too slow


Looks like audio is available on S3 only after been fully synthesized. So I will continue to use my text chunking code in order to get faster playing. Amazon, is it not possible to stream incomplete audio?

asked 4 years ago20 views
2 Answers

Hi zdanevich-vitaly-andreevich,

Thanks for contacting us!

By design audio file produced by StartSpeechSynthesisTask is available for download once it's finished.
Generally Amazon S3 never adds partial objects, so it is not possible for an object to appear in S3 if it's not completely uploaded.
If you requirement is low latency then I recommend using SynthesizeSpeech operation.

On the other hand if you're synthesizing very long texts and you still care about latency you can combine those two in the way that you synthesize first chunk using SynthesizeSpeech and the rest using StartSpeechSynthesisTask.


answered 4 years ago

I wanted to use audio files synthesized by Polly with my Alexa skill, but had to stop that activity.
Alexa takes a session during 8 seconds and then stops responding, but the StartSpeechSynthesisTask takes at least 20-30 seconds even for short sentences - 60 - 80 characters.
I found in the Alexa Skills development documentation, that the described scenario is not allowed by AWS due to security reasons. Audio files that are used with SSML <audio /> element, must be in public access and no authentication is required:

I'm interested in if there are any ways to reduce the gap between the StartSpeechSynthesisTask and the moment when S3 object is fully generated? Is it because of any guards/checks/restrictions for the scenario on AWS side, or due to the nature of Polly and S3 services?

Thanks in advance!

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions