StartSpeechSynthesisTask is too slow

1

Looks like audio is available on S3 only after been fully synthesized. So I will continue to use my text chunking code in order to get faster playing. Amazon, is it not possible to stream incomplete audio?

질문됨 5년 전327회 조회
2개 답변
0

Hi,
I wanted to use audio files synthesized by Polly with my Alexa skill, but had to stop that activity.
Alexa takes a session during 8 seconds and then stops responding, but the StartSpeechSynthesisTask takes at least 20-30 seconds even for short sentences - 60 - 80 characters.
I found in the Alexa Skills development documentation, that the described scenario is not allowed by AWS due to security reasons. Audio files that are used with SSML <audio /> element, must be in public access and no authentication is required: https://developer.amazon.com/en-US/docs/alexa/custom-skills/speech-synthesis-markup-language-ssml-reference.html#audio

I'm interested in if there are any ways to reduce the gap between the StartSpeechSynthesisTask and the moment when S3 object is fully generated? Is it because of any guards/checks/restrictions for the scenario on AWS side, or due to the nature of Polly and S3 services?

Thanks in advance!
Denis

답변함 3년 전
  • I'd also be interested in possible ways to reduce the waiting time. From my experience though, I'll have to wait for (quite exactly) 15 seconds until the file becomes available via S3 (the (then future) file name is immediately returned by the StartSpeechSynthesisTask command) and not an arbitrary amount of seconds.

0

Hi zdanevich-vitaly-andreevich,

Thanks for contacting us!

By design audio file produced by StartSpeechSynthesisTask is available for download once it's finished.
Generally Amazon S3 never adds partial objects, so it is not possible for an object to appear in S3 if it's not completely uploaded.
If you requirement is low latency then I recommend using SynthesizeSpeech operation.

On the other hand if you're synthesizing very long texts and you still care about latency you can combine those two in the way that you synthesize first chunk using SynthesizeSpeech and the rest using StartSpeechSynthesisTask.

Thanks,
Hubert

답변함 5년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠