Textract - Extract form key values in reading order

0

Hi all,

I'm currently using textract to try process a multi page pdf form using python which overall is going well.

I'm having some specific issues though with the key values. Upon extracting them they don't seem to be in any specific order. Was wondering if there was a suggested method to keep them in roughly reading order or organise them that way.

There are multiple fields in the form that have duplicate keys (e.g Yes/No questions) and so was hoping to maintain that to map which Yes/No response belongs to which question.

Uploading the example doc to the web ui and downloading the response keyvalues.csv has it in exactly the order I'm looking for but I'm not sure how they arrange it that way.

Cheers

Syley
asked 2 years ago454 views
1 Answer
0

Textract does not guarantee any sort of ordering of Forms response today. General recommendation is to follow Best Practices for Amazon Textract to ensure input documents are optimized for better results.

AWS
Taka_M
answered a year ago
  • I am also looking to maintain the order of text in the key/value response with getDocumentAnalysis. Processing medical reports where sections repeat, so a group of keys are duplicated, for example multiple physicians; AttLName, AttFName, AttMName. I cannot ensure the First Name is associated with the correct Last Name. It seems to me that the order of data returned by the API is by Confidence Score(desc). When I run the same document through the AWS Analyze Document service from the AWS Console the order is preserved. Why are the results from the console and the API different?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions