AWS Transcribe Medical BadRequestException: Your stream is too big. Reduce the frame size and try your request again.

1

Sending audio file through Amazon Kinesis Video Streams (KVS) stream to Amazon Transcribe Medical. I'm able to get media from KVS and send to Transcribe Medical and even get a few TranscriptEvents.

But then the following error:

2023-02-04T05:10:00.173Z	d0ee90a1-a61f-44f9-826a-fd2c4977425b	ERROR	Error processing transcribe stream. SessionId:  e484b9b6-1657-45d0-bf3d-12c3737e6f65 {
    "name": "BadRequestException",
    "$fault": "client",
    "$metadata": {},
    "message": "Your stream is too big. Reduce the frame size and try your request again."
}

The file being sent through KVS is created with:

    const audio = ffmpeg(uri)
        .format('matroska')
        // .outputOptions(['-ar 8000', '-acodec pcm_s16le' // Only works for Transcribe
        .outputOptions(['-ar 16000', '-acodec pcm_s16le' // Transcribe Medical needs 16,000 Hz to 48,000 Hz
        ])

The only changes to my previously working code:

  • Changed from StartStreamTranscriptionCommand to StartMedicalStreamTranscriptionCommand
  • Set StartMedicalStreamTranscriptionCommandInput to:
{
    LanguageCode: 'en-US',
    MediaEncoding: 'pcm',
    AudioStream: audioStream(),

    // changes for Transcribe Medical
    MediaSampleRateHertz: 16000, //8000,
    Specialty: 'CARDIOLOGY',
    Type: 'CONVERSATION',
  }

What could be causing this?

Using AWS SDK for JavaScript v3

2 Answers
0

Hi rain-mtucker,

With the Transcribe JavaScript SDK, if you see the error 'The chunk is too big', you can solve it by making the highWaterMark smaller. The example below where the highWaterMark is set is from the Transcribe JavaScript SDK documentation..

const { PassThrough } = require("stream");
const { createReadStream } = require("fs");
const audioSource = createReadStream("path/to/speech.wav");
const audioPayloadStream = new PassThrough({ highWaterMark: 1 * 1024 }); // Stream chunk less than 1 KB
audioSource.pipe(audioPayloadStream);
const audioStream = async function* () {
  for await (const payloadChunk of audioPayloadStream) {
    yield { AudioEvent: { AudioChunk: payloadChunk } };
  }
};
AWS
answered a year ago
  • The highWaterMark value was set to 128 and I reduced it to 64 and it still gives the error:

    2023-02-04T17:31:04.829Z	77e692ab-1c41-46b9-99de-4e5fd12363ae	ERROR	Error processing transcribe stream. SessionId:  87997632-5d42-425e-b6ab-6cad9e9f0aff {
        "name": "BadRequestException",
        "$fault": "client",
        "$metadata": {},
        "message": "Your stream is too big. Reduce the frame size and try your request again."
    }
    

    The JavaScript demo client is putting the audio file on a Kinesis Video Streams stream and the backend is getting the audio and passing it to Transcribe streaming. I'm doing this to save the audio stream in a file in the backend. In the end, there will be various clients/systems with microphones that will send audio to KVS.

    Is there something with that portion that could be resulting in the error?

    I can see that all fragments from KVS are being read, the transcription stops after a few TranscriptEvents. I tried changing highWaterMark to 1024, but same error.

  • Here is the Lambda code that reads audio fragments from KVS and sends them to a combined stream for Transcribe & audio file:

      import Block from 'block-stream2';
      const audioStream = new Block(2);
    
      const combinedStream = new PassThrough();
      const combinedStreamBlock = new Block(2);
      combinedStream.pipe(combinedStreamBlock);
      combinedStreamBlock.on('data', (chunk) => {
        // send to transcribe
        transcribePassthroughStream.write(chunk);
    
        // save to tmp file
        writeRecordingStream.write(chunk);
      });
    
      // audioStream comes from KVS
      audioStream.pipe(combinedStream);
    
    
0

Assigning thread_queue_size to our ffmpeg stream in start.sh fixed this bug.

ffmpeg -loglevel $loglevel -thread_queue_size 1024 -re -sn -i $inputb -c:v copy -c:a copy -f flv - | flv+srt - transcript_fifo - | ffmpeg -loglevel $loglevel -thread_queue_size 1024 -y -i - -c:v copy -c:a copy -metadata:s:s:0 language=eng -f $format $output & 
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions