Using Textract for extracting long form text in FORMs mode

0

We have a set of documents that contain question and answer sections like "Enter Statement of Interest Below" that are followed by an area for users to enter in long form text. When we have run a few of these documents through textract, we have found that depending on the formatting of the responses textract has a difficult time recognizing these fields as key value pairs.

For example, if the value for one of these fields has one sentence of text per line (contained within the box allotted for text entry) only the first line is associated with the sections question (key value pair). Another quirk we found is that if a large block of text has been copied and pasted into the text entry block, then textract does not even recognize it as a key value. However, we were able to use queries that contained the full text of the "key" field as the query value (Enter Statement of Interest Below) the expected text was extracted but this was hit or miss at times based on the formatting of the long form text.

We are wondering if this is result due to the formatting of the input documents or a quirk of textract itself.

You can find a copy of the input document here.

asked a year ago306 views
1 Answer
1

Hello,

Thanks for using AWS Textract service. Ideally, Textract form option is used for key-value pair detection of words up to a few sentences[1]. I believe the current requirement is for a long paragraph here under value. We would need sample filled forms for analysis as Textract service is a fully managed service and pre-trained models are used for OCR services based on the requirement.

Regarding the text not being recognized, there are multiple possible scenarios ranging from DPI of the image to the language[2]. Please go through them and I would also suggest to reach out to AWS Support[3] (Textract), along with your issue/use case in detail and sample documents. We will troubleshoot accordingly by including the service team for suggestions on your exact requirement.

[1] https://docs.aws.amazon.com/textract/latest/dg/how-it-works-kvp.html [2] https://docs.aws.amazon.com/textract/latest/dg/textract-best-practices.html [3] https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-casehttps://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case

AWS
SUPPORT ENGINEER
Rakesh
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions