AWS transcribe - is there a way to transcribe numbers not as digits?


I am using AWS transcribe in the following format:

aws transcribe start-transcription-job --language-code en-US --media-format wav --media MediaFileUri=s3://my-bucket/my-audio-file.wav --output-bucket-name my-output-bucket

And in my output files, I am seeing that any number that's being said is transcribed as digits. so for example: "I just spent fifty dollars" is transcribed as "I just spent 50 dollars".

Is there a way to transcribe numbers in their written form and not digits?

1 Answer


At the moment, there is no api parameter to disable the number Transcribe feature( but there are some post process step you can apply, for example, you can use

from num2words import num2words

# Define a function to convert numbers in a sentence to words
def convert_numbers_to_words(sentence):
    words = []
    for word in sentence.split():
        # Check if the word is a number
        if word.isnumeric():
            # Convert the number to words and append to the list
            # Append the original word to the list
    # Join the words back into a sentence
    return " ".join(words)

# Example usage
sentence = "I just spent 50 dollars"
converted_sentence = convert_numbers_to_words(sentence)
import inflect
import re

def convert_numbers_to_words(text):
    p = inflect.engine()
    words = text.split()
    new_words = []

    for word in words:
        if word.isdigit():
            word = p.number_to_words(word)

    return ' '.join(new_words)

transcribed_text = "I just spent 50 dollars"
converted_text = convert_numbers_to_words(transcribed_text)
  • or just your simple dictionary - {number: word} with re and replacement

hope that helps you.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions