Textract - Extract form key values in reading order

0

Hi all,

I'm currently using textract to try process a multi page pdf form using python which overall is going well.

I'm having some specific issues though with the key values. Upon extracting them they don't seem to be in any specific order. Was wondering if there was a suggested method to keep them in roughly reading order or organise them that way.

There are multiple fields in the form that have duplicate keys (e.g Yes/No questions) and so was hoping to maintain that to map which Yes/No response belongs to which question.

Uploading the example doc to the web ui and downloading the response keyvalues.csv has it in exactly the order I'm looking for but I'm not sure how they arrange it that way.

Cheers

Syley
已提問 2 年前檢視次數 477 次
1 個回答
0

Textract does not guarantee any sort of ordering of Forms response today. General recommendation is to follow Best Practices for Amazon Textract to ensure input documents are optimized for better results.

AWS
Taka_M
已回答 2 年前
  • I am also looking to maintain the order of text in the key/value response with getDocumentAnalysis. Processing medical reports where sections repeat, so a group of keys are duplicated, for example multiple physicians; AttLName, AttFName, AttMName. I cannot ensure the First Name is associated with the correct Last Name. It seems to me that the order of data returned by the API is by Confidence Score(desc). When I run the same document through the AWS Analyze Document service from the AWS Console the order is preserved. Why are the results from the console and the API different?

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南