Textract - Extract form key values in reading order

0

Hi all,

I'm currently using textract to try process a multi page pdf form using python which overall is going well.

I'm having some specific issues though with the key values. Upon extracting them they don't seem to be in any specific order. Was wondering if there was a suggested method to keep them in roughly reading order or organise them that way.

There are multiple fields in the form that have duplicate keys (e.g Yes/No questions) and so was hoping to maintain that to map which Yes/No response belongs to which question.

Uploading the example doc to the web ui and downloading the response keyvalues.csv has it in exactly the order I'm looking for but I'm not sure how they arrange it that way.

Cheers

Syley
質問済み 2年前478ビュー
1回答
0

Textract does not guarantee any sort of ordering of Forms response today. General recommendation is to follow Best Practices for Amazon Textract to ensure input documents are optimized for better results.

AWS
Taka_M
回答済み 2年前
  • I am also looking to maintain the order of text in the key/value response with getDocumentAnalysis. Processing medical reports where sections repeat, so a group of keys are duplicated, for example multiple physicians; AttLName, AttFName, AttMName. I cannot ensure the First Name is associated with the correct Last Name. It seems to me that the order of data returned by the API is by Confidence Score(desc). When I run the same document through the AWS Analyze Document service from the AWS Console the order is preserved. Why are the results from the console and the API different?

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ