Transcribe Cost/Enablement with Chime SDK

0

For my product I've integrated the chime sdk as our webrtc media server/meeting management and that's been great. One of the things we were going to build ourselves was transcription on top of the meeting, but it was very easy to hook in amazon transcribe with the chime media capture pipelines so we've started beta testing with it, and love the functionality.

The problem is there seems to be a lot of hidden/undescribed functionality with this integration that drives cost up significantly. I'd like to get clarity on:

  1. When using transcribe with the chime sdk with live transcription, when does the meeting end? Is there not automatic ending anymore when everyone leaves the meeting? Why is that?
  2. Does transcribe charge by the minute for however long the meeting is times the number of channels/people, even when it's 90% silence? Most speech recognition services nowadays perform silence detection and don't run the model (and charge) per minute when it's just silence coming through.
  • And for some interesting additional information, someone from the team seems to concur in a github issue that auto-meeting end breaks when using media pipelines: "For media pipeline attendee, it is a muted empty audio stream but it is still counted as an attendee. So in your case you would need to listen to event where all attendee leaves then stop the media pipeline meeting manually." source: https://github.com/aws/amazon-chime-sdk-js/issues/2454

    Since the integration is AWS service to AWS service, I would hope this was better documented and handled these cases better.

diamond
asked a year ago393 views
2 Answers
0

When you call the StartMeetingTranscription API, the Amazon Chime SDK creates exactly 1 stream to Amazon Transcribe (using StartStreamTranscription), with two channels. https://docs.aws.amazon.com/chime-sdk/latest/dg/meeting-transcription.html#billing-and-usage

The Transcribe stream exists (and billable) until StopMeetingTranscription is called, or the meeting is ended. The logic to automatically stop a meeting is not affected by the existence of a Transcription stream.

The impact of a Media Pipeline on the meeting auto-end logic is a known issue (it is counted as an attendee) and is being addressed (it will NOT be counted as an attendee). This work is in progress.

AWS
answered a year ago
-1

Hi,

Kindly note that an Amazon Chime SDK meeting ends when you invoke the DeleteMeeting API action. Also, a meeting automatically ends after a period of inactivity, based on the following rules:

For Amazon Chime SDK namespace:

  • No audio connections are present in the meeting for more than five minutes.
  • 24 hours have elapsed since the meeting was created.

See here for more information - https://aws.github.io/amazon-chime-sdk-js/modules/faqs.html#when-does-an-amazon-chime-sdk-meeting-end

As for your question around charges for Transcribe service, for Real-time transcription you are charged from when the streaming begins till it stops. This is because, as long as the connection is open, the system will be "listening" in order to transcribe your audio. This means that processing is happening in the background, even if nothing is said, hence the charge. More information on Transcribe pricing can be found here: https://aws.amazon.com/transcribe/pricing/

If you need more assistance in finding the root cause behind why the costs are going up, please reach out to AWS Support for technical assistance. You can open a support case with AWS using the following link: https://console.aws.amazon.com/support/home#/case/create

AWS
SUPPORT ENGINEER
answered a year ago
  • "No audio connections are present in the meeting for more than five minutes." -> I don't think this is 100% correct (or correct at all) for any chime sdk meeting that uses a media pipeline. I can see this in my own system and in the dev response here: https://github.com/aws/amazon-chime-sdk-js/issues/2454. If you add media capture you can have all users leave a meeting and it still stays open since the media capture is being counted as an audio connection. That seems both undocumented and a bad design. Can you confirm this is true?

  • "you are charged from when the streaming begins till it stops" -> Can you clarify the other part of my questions for this? Am I charged meeting time * number of audio streams, so each person in a meeting multiplies the cost for transcription?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions