Polly: GetSpeechSynthesisTask return length of audio

0

can we get the duration of the audio returned in the success response

질문됨 2년 전886회 조회
2개 답변
0

Hi! Amazon Polly does not currently return the length of the audio. You would have to use third party libraries such as ffmpeg or mutagen to obtain the length from the file! Hope this helps! :)

AWS
Dani M
답변함 2년 전
  • Hey, did this work for you? Remember to accept the answer if it has helped you (^_^)

0

I can provide an alternative to Dani's suggestion. There is an approach which will allow you to determine the length of generated Polly audio by using nothing more than the Polly API, though it is a little non-obvious. You can request that Polly return "speech marks" - metadata which describes the synthesized speech. One type of speech mark data you can request are "visemes". These are timestamped descriptions of the face and mouth movements that a human would use to make each component sound found in the generated audio. The viseme data will always end with a viseme labelled "sil" which represents silence. The time-stamp of this last viseme represents the end of the final sound in the generated audio.

Here's the viseme metadata returned for the phrase "Hello, world"...

{"time":125,"type":"viseme","value":"k"}
{"time":200,"type":"viseme","value":"@"}
{"time":237,"type":"viseme","value":"t"}
{"time":300,"type":"viseme","value":"o"}
{"time":375,"type":"viseme","value":"sil"}
{"time":562,"type":"viseme","value":"u"}
{"time":750,"type":"viseme","value":"E"}
{"time":837,"type":"viseme","value":"t"}
{"time":1000,"type":"viseme","value":"t"}
{"time":1212,"type":"viseme","value":"sil"}

Viseme timestamps are expressed in milliseconds. So we can see that, given the voice I selected when generating the audio, the generated audio is 1.212 seconds in duration.

profile pictureAWS
Kris
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠