Skip to content

InvalidJobIdException

0

Hi,

I´m trying to use the textract async features and I´m getting an error that I don't understand. When I use the syncronus functions everything works fine, but for the async functions I get a InvalidJobIdException.

My code:

import boto3, sys, re, json
import pandas as pd
from pandas.io.json import json_normalize
import trp.trp2 as t2

# find S3Bucket and S3Key for invoice files
S3Bucket = "Bucketname"
S3key = "knime/Prüfprotokoll 0050-0035_image0.png"

textract = boto3.client(aws_access_key_id="KeyId", aws_secret_access_key="Secret", service_name='textract', region_name='eu-central-1')

class TextractWrapper:
    """Encapsulates Textract functions."""
    def __init__(self, textract_client, s3_resource, sqs_resource):
        """
        :param textract_client: A Boto3 Textract client.
        :param s3_resource: A Boto3 Amazon S3 resource.
        :param sqs_resource: A Boto3 Amazon SQS resource.
        """
        self.textract_client = textract_client
        self.s3_resource = s3_resource
        self.sqs_resource = sqs_resource

    def start_detection_job(
            self, bucket_name, document_file_name, sns_topic_arn, sns_role_arn):
        """
        Starts an asynchronous job to detect text elements in an image stored in an
        Amazon S3 bucket. Textract publishes a notification to the specified Amazon SNS
        topic when the job completes.
        The image must be in PNG, JPG, or PDF format.

        :param bucket_name: The name of the Amazon S3 bucket that contains the image.
        :param document_file_name: The name of the document image stored in Amazon S3.
        :param sns_topic_arn: The Amazon Resource Name (ARN) of an Amazon SNS topic
                              where the job completion notification is published.
        :param sns_role_arn: The ARN of an AWS Identity and Access Management (IAM)
                             role that can be assumed by Textract and grants permission
                             to publish to the Amazon SNS topic.
        :return: The ID of the job.
        """
        try:
            response = self.textract_client.start_document_text_detection(
                DocumentLocation={
                    'S3Object': {'Bucket': bucket_name, 'Name': document_file_name}},
                NotificationChannel={
                    'SNSTopicArn': sns_topic_arn, 'RoleArn': sns_role_arn})
            job_id = response['JobId']
            logger.info(
                "Started text detection job %s on %s.", job_id, document_file_name)
        except ClientError:
            logger.exception("Couldn't detect text in %s.", document_file_name)
            raise
        else:
            return job_id
        
    def get_analysis_job(self, job_id):
        """
        Gets data for a previously started detection job that includes additional
        elements.
        :param job_id: The ID of the job to retrieve.
        :return: The job data, including a list of blocks that describe elements
                 detected in the image.
        """
        try:
            response = self.textract_client.get_document_analysis(
                JobId=job_id)
            job_status = response['JobStatus']
            logger.info("Job %s status is %s.", job_id, job_status)
        except ClientError:
            logger.exception("Couldn't get data for job %s.", job_id)
            raise
        else:
            return response






hallo = TextractWrapper(textract, S3Bucket, "asd")
hallo.start_detection_job(S3Bucket, S3key, "arn:aws:sns:eu-central-1:11111:MyTopic", "arn:aws:iam::1111:role/TextTractRolePaul")

This code works fine and I get a JobID For example 'd8e8b13285a9e4494a95774bd0c525a5486e12691e26848da5c58536d425ea70'

But when I call

response_async = hallo.get_analysis_job('d8e8b13285a9e4494a95774bd0c525a5486e12691e26848da5c58536d425ea70')

I get the Expection: InvalidJobIdException: An error occurred (InvalidJobIdException) when calling the GetDocumentAnalysis operation: Request has invalid Job Id

It doesn't matter how long I wait.

I followed the following guide to setup Textract für Async analysis: https://docs.aws.amazon.com/textract/latest/dg/api-async-roles.html

Any Pointers were I go wrong, or what I should check?

Kind regards,

Paul

asked 3 years ago1.3K views
3 Answers
1

Hello,

I see that you are using "start_document_text_detection" API, to get the results for this you will need to use "get_document_text_detection" API.

Similarly for "start_document_text_detection" you will have to use "get_document_analysis". Using these APIs interchangeably is not possible.

AWS
SUPPORT ENGINEER
answered 3 years ago
  • Hi Shashank,

    You have pointed this out perfectly. I was able to solve the issue following your answer here for one of my customers. Thank you.

0

Hey Paul!

Just to confirm, when you make the call are you using the following call format?

response = client.get_document_analysis(
    JobId='string',
    MaxResults=123,
    NextToken='string'
)

AWS
answered 3 years ago
0

Hi Dani,

yes I do.

Best regards,

Paul

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.