Transcribe: Pipe-separated languageCodes are failing to satisfy enum value set


Getting a very strange issue starting a TranscriptionJob. I am using the AWS SDK for PHP.

I am transcribing audio files that contain both English and Hebrew.

  'LanguageCode'          => 'en-US|he-IL', 
  'Media' => [
    'MediaFileUri'        => 'xxx'
  'OutputBucketName'      => 'xxx',
  'OutputKey'             => 'xxx.json',
  'TranscriptionJobName'  =>'xxx-job'

This returns the following error: BadRequestException (client): 1 validation error detected: Value 'en-US|he-IL' at 'languageCode' failed to satisfy constraint: Member must satisfy enum value set: [en-IE, ar-AE, te-IN, zh-TW, en-US, ta-IN, en-AB, en-IN, zh-CN, ar-SA, en-ZA, gd-GB, th-TH, tr-TR, ru-RU, pt-PT, nl-NL, it-IT, id-ID, fr-FR, es-ES, de-DE, ga-IE, af-ZA, en-NZ, ko-KR, hi-IN, de-CH, vi-VN, cy-GB, ms-MY, he-IL, da-DK, en-AU, pt-BR, en-WL, fa-IR, sv-SE, ja-JP, es-US, fr-CA, en-GB]

NOTE!!: If I replace LanguageCode with 'en-US' (or with 'he-IL'), it works great... (except of course for the fact that my result is only English or only Hebrew, obviously!).

All of the documentation shows LanguageCode specified as a pipe-separated string, but if I specify more than 1 LanguageCode, I get this Member must satisfy enum value set exception.

asked a year ago399 views
1 Answer
Accepted Answer

The LanguageCode parameter is meant to be used when your audio file includes only one language. In the documentation, the pipe symbol represents "or" so it means that you should select one of choices.

But, there is a way to do what you want...

Using the parameter IdentifyMultipleLanguages in your transcription job enables automatic multi-language identification in your transcription job request. Use this parameter if your media file contains more than one language. If you include IdentifyMultipleLanguages, you can optionally include a list of language codes, using LanguageOptions, that you think may be present in your media file.

LanguageOptions is an array of strings and you can provide both 'en-US' and 'he-IL' in the array.

profile pictureAWS
answered a year ago
  • Hi, thanks for your reply! In hindsight, the pipe symbol representing "or" makes total sense... :-)

    I did add IdentifyMultipleLanguages and LanguageOptions which does return a response. Unfortunately, since my audio files are single sentences as part of a learning app (so they contain English and Hebrew in the same sentence), it thinks every word is English. Maybe I'll look into improving the vocabulary, but thanks for the answer and this question is now closed!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions