How to stream audio from Amazon Chime SDK video meetings to Kinesis Video Streams for real-time transcription?

0

We’re planning to build a solution using the Amazon Chime SDK to host video meetings in our web application. Our goal is to capture meeting audio in real time and send it to Amazon Kinesis Video Streams (KVS) so we can run live transcription and other downstream processing.

We initially looked into Chime Media Capture Pipelines, but they are limited to 10 concurrent pipelines per account. As documented:

This makes them unsuitable for our use case, which requires support for more than 10 concurrent meetings.

Question Is there a recommended architectural pattern for:

  • Hosting video meetings using the Amazon Chime SDK,
  • Accessing the live audio stream from each meeting, and
  • Sending it to Amazon Kinesis Video Streams (or another service) in real time without Chime Media Capture Pipelines?

We’re aiming to use managed AWS services as much as possible. Any guidance or reference architectures would be appreciated!

2 Answers
0

Please note that the default limits are per-region and us-east-1 allows 100 (not 10) concurrent media pipelines by default. The other regions are 10 each. https://docs.aws.amazon.com/chime-sdk/latest/dg/media-pipelines-limits.html

Also please note that these are just the default SOFT limits. These quotas can be adjusted to accommodate your business requirements. Please make a request through the Service Quotas page in the AWS Console.

For sending meeting audio to KVS, media capture pipelines are the appropriate solution and you should not let the default limit prevent you from proceeding with your application.

However, if you are only processing the audio for transcription, you will get better results using the native integration with Amazon Transcribe. The Amazon Chime SDK live transcription feature will manage this for you. (https://docs.aws.amazon.com/chime-sdk/latest/dg/meeting-transcription.html)

profile pictureAWS
answered 12 days ago
-1

Yes, there is a recommended architectural pattern for your use case that doesn't rely on Chime Media Capture Pipelines. You can use the Amazon Chime SDK's audio analysis feature to stream audio directly from your WebRTC sessions to Amazon Kinesis Video Streams for real-time processing and transcription.

Here's a high-level overview of the recommended approach:

  1. Host video meetings using the Amazon Chime SDK in your web application.

  2. Utilize the SDK's audio analysis feature to send the WebRTC session audio to Amazon Kinesis Video Streams.

  3. You have two options for streaming the audio: a. Send each attendee's audio stream to a separate Kinesis Video Stream for individual analysis. b. Send a single mixed-audio stream with the meeting's audio.

  4. Once the audio is in Kinesis Video Streams, you can perform live analysis using AWS machine learning services, including real-time transcription.

This approach allows you to bypass the limitations of Media Capture Pipelines and support more than 10 concurrent meetings. It leverages managed AWS services as requested, utilizing the Amazon Chime SDK for video meetings and Kinesis Video Streams for audio processing.

For transcription, you can integrate Amazon Transcribe with your Kinesis Video Streams. The Amazon Chime SDK supports real-time meeting transcriptions through integration with Amazon Transcribe or Amazon Transcribe Medical. While this is typically done within the SDK, you can achieve a similar result by processing the audio streams from Kinesis Video Streams.

Keep in mind that audio sent to Kinesis typically uses a sampling rate of 8 kHz, which is suitable for transcription purposes.

To implement this solution, you'll need to:

  1. Set up your Amazon Chime SDK meetings in your web application.
  2. Implement the audio streaming to Kinesis Video Streams using the SDK's audio analysis feature.
  3. Configure your Kinesis Video Streams to receive the audio data.
  4. Set up Amazon Transcribe to process the audio streams from Kinesis Video Streams.
  5. Implement any additional downstream processing or storage of the transcription results.

This architecture should allow you to scale beyond the 10-meeting limit of Media Capture Pipelines while still leveraging managed AWS services for your real-time audio capture and transcription needs.
Sources
Video Chat SDK - Amazon Chime SDK Features -AWS
Using Amazon Chime SDK live transcription - Amazon Chime SDK
Plan for live media streaming from Amazon Connect to Kinesis Video Streams - Amazon Connect

profile picture
answered 12 days ago
profile picture
EXPERT
reviewed 12 days ago
  • If this is true, then how to do it without Chime Media Capture Pipelines? (as the requirement initially was) Every sample seems to rely on Chime Media Capture Pipelines.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions