Announcing A new generative engine and three voices are now generally available on Amazon Polly

3 minute read
Content level: Advanced

What's New post

This new generative engine represents a significant advancement, trained with a diverse range of data sources, including publicly available and proprietary data, covering various voices, languages, and styles. It excels in rendering context-dependent prosody, including pausing, spelling, dialectal properties, foreign word pronunciation, and more.

Amazon Polly is a machine learning (ML) service that transforms text into lifelike speech, commonly referred to as text-to-speech (TTS) technology. With this update, Amazon Polly now offers high-quality, natural-sounding human-like voices in numerous languages, allowing you to choose the perfect voice for your speech-enabled applications across different regions and languages.

With Amazon Polly, you gain access to various voice options, including neural, long-form, and generative voices. These voices represent groundbreaking advancements in speech quality, delivering human-like, highly expressive, and emotionally adept voices. You can customize the speech output by adjusting parameters such as speech rate, pitch, or volume using Speech Synthesis Markup Language (SSML) tags, and quickly deploy lifelike voices and conversational user experiences with consistently fast response times.

What's New with the Generative Engine? Amazon Polly now supports four voice engines: standard, neural, long-form, and generative voices.

Standard TTS Voices: Introduced in 2016, these voices utilize traditional concatenative synthesis, stringing together phonemes of recorded speech to produce natural-sounding synthesized speech. Neural TTS (NTTS) Voices: Introduced in 2019, NTTS voices utilize a sequence-to-sequence neural network to convert phoneme sequences into spectrograms, resulting in even higher quality human-like voices compared to standard voices. Long-Form Voices: Introduced in 2023, these voices leverage cutting-edge deep learning TTS technology to engage listeners for longer content such as news articles, training materials, or marketing videos. Generative Voices: In February 2024, Amazon scientists introduced a groundbreaking research TTS model called Big Adaptive Streamable TTS with Emergent abilities (BASE). Leveraging this technology, the Polly Generative engine can create human-like synthetically generated voices, suitable for various applications such as customer assistance, virtual training, or marketing. The new generative voices available are:

Ruth: Locale - en_US, Gender - Female, Language - English (US) Matthew: Locale - en_US, Gender - Male, Language - English (US) Amy: Locale - en_GB, Gender - Female, Language - English (British) These generative voices offer a wide range of applications and use cases. To learn more about the generative engine, please visit the Generative Voices section in the AWS documentation.

Getting Started with Generative Voices: You can access the new voices using the AWS Management Console, AWS Command Line Interface (AWS CLI), or the AWS SDKs.

To get started, navigate to the Amazon Polly console in the US (N. Virginia) Region and choose the Text-to-Speech menu in the left pane. Select the voice of your choice, such as Ruth or Matthew in English (US), or Amy in English (UK), and opt for the Generative engine. Input your text and listen to or download the generated voice output.

Using the CLI, you can list the voices that utilize the new generative engine and synthesize sample text to an audio file with the supported voice ID.

Now available in the US East (N. Virginia) Region, the new generative voices of Amazon Polly offer unparalleled flexibility and quality. You only pay for what you use based on the number of characters of text that you convert to speech. To learn more, please visit our Amazon Polly Pricing page.

Source :-

profile picture
published 16 days ago1264 views