AWS Transcribe, best use case

0

Hey all. 2 quick questions, I have this sample code in my Lambda now:

const params = {
    LanguageCode: "en-US",
    Media: {
      MediaFileUri: "https://transcribe-demo.s3-eu-central-1.amazonaws.com/hello_world.wav",
    },
    MediaFormat: "mp3",
    TranscriptionJobName: `TranscriptionJob-${Date.now()}`,
    OutputBucketName: "transcribe-output-bucket"
  };

  const data = await transcribeClient.send(
    new StartTranscriptionJobCommand(params)
  );

Question 1: The plan is to send an voice recording url to an api gateway. I then want to transcribe it. And after that send it to an API to summarize it. I don't think AWS has a service for this so I will probably try out ChatGPT. Is it possible to directly transcribe it from the url? Or do I first have to download it to a bucket.

Question 2: Trough the AWS Console I can let transcribe auto detect the language, is this also possible with the Javascript sdk? I don't see that option.

Thanks all. If you have any suggestions on how to implement this better feel free to let me know.

1 Answer
0

Q1 : Is it possible to directly transcribe it from the url? Or do I first have to download it to a bucket.

A1 : From this documentation[1], it is mentioned that Amazon Transcribe takes audio data, as a media file in an Amazon S3 bucket or a media stream, and converts it to text data. If you're transcribing media files stored in an Amazon S3 bucket, you're performing batch transcriptions. If you're transcribing media streams, you're performing streaming transcriptions. These two processes have different rules and requirements. You may also use Sagemaker JumpStart to deploy LLM/Models to do the summarization[2] and it is very straightforward process

Q 2: Trough the AWS Console I can let transcribe auto detect the language, is this also possible with the Javascript sdk?

A2 : You can also add custom vocabulary for the languages. Please take a look on the code snippet :

"use strict";

import { StartTranscriptionJobCommand } from "@aws-sdk/client-transcribe";
import { TranscribeClient } from "@aws-sdk/client-transcribe";

const REGION = "us-east-1";
const BUCKET = "YOUR_BUCKET";
const KEY = "YOUR_FILE";

const transcribeClient = new TranscribeClient({ region: REGION });
let random = (Math.random() + 1).toString(36).substring(7);
console.log('key = ' + KEY);

export const params = {
  IdentifyLanguage: true,
  LanguageOptions: ['en-US','fr-FR'],
  LanguageIdSettings: { 
    "en-US" : { 
       VocabularyName: "custom-vocab-en_US"
    },
    "fr-FR" : { 
      VocabularyName: "custom-vocab-fr_FR"
   }
 },
  Media: {
    MediaFileUri: `https://s3-${REGION}.amazonaws.com/${BUCKET}/${KEY}` 
  },
  MediaFormat: 'mp3',
  TranscriptionJobName: `Transcribe-Job-${random}`,
  OutputBucketName: 'YOUR_BUCKET'
};

export const run = async () => {
  try {
    const data = await transcribeClient.send(
      new StartTranscriptionJobCommand(params)
    );
    console.log("Success - put", data);
    return data;
  } catch (err) {
    console.log("Error", err);
  }
};
run();

Resources : [1] - https://docs.aws.amazon.com/transcribe/latest/dg/how-input.html

[2] - https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html

AWS
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions