Textract: is it possible analyze document demo version and regular version are not aligned?

0

Hello all,

cannot post a sample document in here, but lets say I'm working with invoices, pdf. All of a sudden, I found a couple odd balls today where my scripts that consume textract will consistently failt return 1 structured table (out of 3, the other 2 semi-strcutured) present in the document. I've tried everything, using boto3 analyze_document stripped of everything but 'TABLES' feature and I only get 2 tables for these invoices. If I put the same invoices through the Analyze Document demo (https://us-east-1.console.aws.amazon.com/textract/home?region=us-east-1#/demo) I'm always getting the 3 tables present in the invoices as expected. So I was thinking maybe demo is not actually consuming the same service that I am through scripting. Is that even a possibility? I can't think of any other explanation.

Thanks!

  • Hello,

    From the information you have provided so far, I think there might be a bug in your scripts. Without reviewing the script I won't be able to provide any more details.

    AFAIK, under the hood the service works just the same. The only variable here is your scripts. We can collaborate offline if you'd like me to review and determine a solution.

gefragt vor 2 Monaten128 Aufrufe
2 Antworten
1

Seems like you are using a 1 pager PDF, and you are using sync api analyze_document with boto3? Console demo uses async api even for 1 pager pdf and sync vs async uses a different path for rendering PDF. That might explain the difference. I would recommend you to try out async start_document_analysis

AWS
beantwortet vor 2 Monaten
0

Thank you! I actually found a related question right after this and the replies indeed point to the same explanation. I'll definitely try.

Best!

beantwortet vor 2 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen