My java trascribe job is not completing - using older Java 1.x sdk

0

When I check the status it remains in progress forever. I am using a test MP3 file using Amazon Polly that I saved to my S3 bucket, and still no luck. No errors, just never completes. I hope I am missing something obvious!

private static StartTranscriptionJobRequest buildVTTRequest(String subFolder, String audioFileId, Long userId) throws WIOSException { StartTranscriptionJobRequest request = new StartTranscriptionJobRequest(); request.setMediaSampleRateHertz(16000); request.setMediaFormat("mp3"); request.setLanguageCode("en-US"); request.setTranscriptionJobName(userId.toString());

    Media media = new Media();
String s3BucketURI = "...";
    media.setMediaFileUri(s3BucketURI);
    request.setMedia(media);
request.setOutputBucketName("...");
request.setOutputKey("...");
    return request;
}

... StartTranscriptionJobRequest request = buildVTTRequest(subFolder, audioFileId, userId); StartTranscriptionJobResult response = transcribeClient.startTranscriptionJob(request); TranscriptionJob transcriptionJob = response.getTranscriptionJob();

asked a month ago13 views
1 Answer
0

seems like your transcription job is not completing due to a couple of potential reasons, such as configuration issues, permission problems, or job status handling. Let's break down the common pitfalls and things to check:

  1. Check Permissions Make sure that your AWS IAM role has the appropriate permissions to access both S3 and Amazon Transcribe services. Your IAM role should have permissions like: transcribe:StartTranscriptionJob transcribe:GetTranscriptionJob s3:GetObject for the input file (MP3 in your case) s3:PutObject for the output location in S3 Check that the permissions are in place for both the input file (in S3) and the output destination.

  2. Job Status Handling You mentioned that the job remains in progress forever. This could happen if you are not polling or handling the job status appropriately after starting the job.

Try to fetch the status of the transcription job by polling the status in a loop after calling startTranscriptionJob. The job status will either be IN_PROGRESS, COMPLETED, or FAILED. You need to check the status periodically.

Here's an example of how to do that:

private static TranscriptionJob waitForTranscriptionJob(TranscribeClient transcribeClient, String jobName) { // Poll until the transcription job is complete while (true) { // Create request to check job status GetTranscriptionJobRequest getTranscriptionJobRequest = GetTranscriptionJobRequest.builder() .transcriptionJobName(jobName) .build();

    // Get the transcription job details
    GetTranscriptionJobResponse response = transcribeClient.getTranscriptionJob(getTranscriptionJobRequest);
    TranscriptionJob job = response.transcriptionJob();

    // Check job status
    if ("COMPLETED".equals(job.transcriptionJobStatus())) {
        return job; // Job has completed
    } else if ("FAILED".equals(job.transcriptionJobStatus())) {
        throw new RuntimeException("Transcription job failed: " + job.failureReason());
    }

    // Sleep for a few seconds before checking again
    try {
        Thread.sleep(5000); // Wait 5 seconds before polling again
    } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
    }
}

} In the above code:

The job status is checked every 5 seconds. Once the job completes, the result will be returned. If the job fails, an exception is thrown.

  1. S3 Bucket Permissions Ensure that the S3 bucket you're using has the correct permissions for Amazon Transcribe to read the input MP3 file and write the output. This includes both the input file's S3 URI and the output bucket.

Example S3 bucket policy for Amazon Transcribe access:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::your-input-bucket/", "arn:aws:s3:::your-output-bucket/" ] } ] } 4. Media File Format and Sample Rate Make sure the sample rate of the audio file matches what you specify. In your case, you're using a sample rate of 16000 which is valid for MP3 files, but verify that the audio file truly has a sample rate of 16 kHz.

You can check the properties of the MP3 file to confirm its sample rate. If the sample rate is different, either re-encode the file with the correct sample rate or adjust the setMediaSampleRateHertz value accordingly.

  1. Check Transcribe Client Ensure that the transcribeClient you're using is correctly configured to interact with the AWS Transcribe service. For example, check if the endpoint is correctly set up, and ensure that the client is properly authenticated.

TranscribeClient transcribeClient = TranscribeClient.builder() .region(Region.US_EAST_1) // Set the correct region .credentialsProvider(ProfileCredentialsProvider.create()) // Or use any appropriate credentials provider .build(); 6. Debugging Transcription Job Issues You can enable debugging logs for AWS SDK to help debug the issue by setting the logging level to DEBUG:

System.setProperty("java.util.logging.ConsoleHandler.level", "ALL"); This will output detailed logs about the SDK’s API calls and responses, which could provide more insights into why the job is not completing.

regards, M Zubair https://zeonedge.com

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions