Textract: is it possible analyze document demo version and regular version are not aligned?

0

Hello all,

cannot post a sample document in here, but lets say I'm working with invoices, pdf. All of a sudden, I found a couple odd balls today where my scripts that consume textract will consistently failt return 1 structured table (out of 3, the other 2 semi-strcutured) present in the document. I've tried everything, using boto3 analyze_document stripped of everything but 'TABLES' feature and I only get 2 tables for these invoices. If I put the same invoices through the Analyze Document demo (https://us-east-1.console.aws.amazon.com/textract/home?region=us-east-1#/demo) I'm always getting the 3 tables present in the invoices as expected. So I was thinking maybe demo is not actually consuming the same service that I am through scripting. Is that even a possibility? I can't think of any other explanation.

Thanks!

  • Hello,

    From the information you have provided so far, I think there might be a bug in your scripts. Without reviewing the script I won't be able to provide any more details.

    AFAIK, under the hood the service works just the same. The only variable here is your scripts. We can collaborate offline if you'd like me to review and determine a solution.

asked a month ago118 views
2 Answers
1

Seems like you are using a 1 pager PDF, and you are using sync api analyze_document with boto3? Console demo uses async api even for 1 pager pdf and sync vs async uses a different path for rendering PDF. That might explain the difference. I would recommend you to try out async start_document_analysis

AWS
answered a month ago
0

Thank you! I actually found a related question right after this and the replies indeed point to the same explanation. I'll definitely try.

Best!

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions