StartSpeechSynthesisTask is too slow

1

Looks like audio is available on S3 only after been fully synthesized. So I will continue to use my text chunking code in order to get faster playing. Amazon, is it not possible to stream incomplete audio?

已提问 5 年前302 查看次数
2 回答
0

Hi,
I wanted to use audio files synthesized by Polly with my Alexa skill, but had to stop that activity.
Alexa takes a session during 8 seconds and then stops responding, but the StartSpeechSynthesisTask takes at least 20-30 seconds even for short sentences - 60 - 80 characters.
I found in the Alexa Skills development documentation, that the described scenario is not allowed by AWS due to security reasons. Audio files that are used with SSML <audio /> element, must be in public access and no authentication is required: https://developer.amazon.com/en-US/docs/alexa/custom-skills/speech-synthesis-markup-language-ssml-reference.html#audio

I'm interested in if there are any ways to reduce the gap between the StartSpeechSynthesisTask and the moment when S3 object is fully generated? Is it because of any guards/checks/restrictions for the scenario on AWS side, or due to the nature of Polly and S3 services?

Thanks in advance!
Denis

已回答 2 年前
  • I'd also be interested in possible ways to reduce the waiting time. From my experience though, I'll have to wait for (quite exactly) 15 seconds until the file becomes available via S3 (the (then future) file name is immediately returned by the StartSpeechSynthesisTask command) and not an arbitrary amount of seconds.

0

Hi zdanevich-vitaly-andreevich,

Thanks for contacting us!

By design audio file produced by StartSpeechSynthesisTask is available for download once it's finished.
Generally Amazon S3 never adds partial objects, so it is not possible for an object to appear in S3 if it's not completely uploaded.
If you requirement is low latency then I recommend using SynthesizeSpeech operation.

On the other hand if you're synthesizing very long texts and you still care about latency you can combine those two in the way that you synthesize first chunk using SynthesizeSpeech and the rest using StartSpeechSynthesisTask.

Thanks,
Hubert

已回答 5 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则