Textract: is it possible analyze document demo version and regular version are not aligned?

0

Hello all,

cannot post a sample document in here, but lets say I'm working with invoices, pdf. All of a sudden, I found a couple odd balls today where my scripts that consume textract will consistently failt return 1 structured table (out of 3, the other 2 semi-strcutured) present in the document. I've tried everything, using boto3 analyze_document stripped of everything but 'TABLES' feature and I only get 2 tables for these invoices. If I put the same invoices through the Analyze Document demo (https://us-east-1.console.aws.amazon.com/textract/home?region=us-east-1#/demo) I'm always getting the 3 tables present in the invoices as expected. So I was thinking maybe demo is not actually consuming the same service that I am through scripting. Is that even a possibility? I can't think of any other explanation.

Thanks!

  • Hello,

    From the information you have provided so far, I think there might be a bug in your scripts. Without reviewing the script I won't be able to provide any more details.

    AFAIK, under the hood the service works just the same. The only variable here is your scripts. We can collaborate offline if you'd like me to review and determine a solution.

質問済み 2ヶ月前129ビュー
2回答
1

Seems like you are using a 1 pager PDF, and you are using sync api analyze_document with boto3? Console demo uses async api even for 1 pager pdf and sync vs async uses a different path for rendering PDF. That might explain the difference. I would recommend you to try out async start_document_analysis

AWS
回答済み 2ヶ月前
0

Thank you! I actually found a related question right after this and the replies indeed point to the same explanation. I'll definitely try.

Best!

回答済み 2ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ