Textract - Extract form key values in reading order

0

Hi all,

I'm currently using textract to try process a multi page pdf form using python which overall is going well.

I'm having some specific issues though with the key values. Upon extracting them they don't seem to be in any specific order. Was wondering if there was a suggested method to keep them in roughly reading order or organise them that way.

There are multiple fields in the form that have duplicate keys (e.g Yes/No questions) and so was hoping to maintain that to map which Yes/No response belongs to which question.

Uploading the example doc to the web ui and downloading the response keyvalues.csv has it in exactly the order I'm looking for but I'm not sure how they arrange it that way.

Cheers

Syley
질문됨 2년 전478회 조회
1개 답변
0

Textract does not guarantee any sort of ordering of Forms response today. General recommendation is to follow Best Practices for Amazon Textract to ensure input documents are optimized for better results.

AWS
Taka_M
답변함 2년 전
  • I am also looking to maintain the order of text in the key/value response with getDocumentAnalysis. Processing medical reports where sections repeat, so a group of keys are duplicated, for example multiple physicians; AttLName, AttFName, AttMName. I cannot ensure the First Name is associated with the correct Last Name. It seems to me that the order of data returned by the API is by Confidence Score(desc). When I run the same document through the AWS Analyze Document service from the AWS Console the order is preserved. Why are the results from the console and the API different?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠