Using Textract for extracting long form text in FORMs mode

0

We have a set of documents that contain question and answer sections like "Enter Statement of Interest Below" that are followed by an area for users to enter in long form text. When we have run a few of these documents through textract, we have found that depending on the formatting of the responses textract has a difficult time recognizing these fields as key value pairs.

For example, if the value for one of these fields has one sentence of text per line (contained within the box allotted for text entry) only the first line is associated with the sections question (key value pair). Another quirk we found is that if a large block of text has been copied and pasted into the text entry block, then textract does not even recognize it as a key value. However, we were able to use queries that contained the full text of the "key" field as the query value (Enter Statement of Interest Below) the expected text was extracted but this was hit or miss at times based on the formatting of the long form text.

We are wondering if this is result due to the formatting of the input documents or a quirk of textract itself.

You can find a copy of the input document here.

demandé il y a un an323 vues
1 réponse
1

Hello,

Thanks for using AWS Textract service. Ideally, Textract form option is used for key-value pair detection of words up to a few sentences[1]. I believe the current requirement is for a long paragraph here under value. We would need sample filled forms for analysis as Textract service is a fully managed service and pre-trained models are used for OCR services based on the requirement.

Regarding the text not being recognized, there are multiple possible scenarios ranging from DPI of the image to the language[2]. Please go through them and I would also suggest to reach out to AWS Support[3] (Textract), along with your issue/use case in detail and sample documents. We will troubleshoot accordingly by including the service team for suggestions on your exact requirement.

[1] https://docs.aws.amazon.com/textract/latest/dg/how-it-works-kvp.html [2] https://docs.aws.amazon.com/textract/latest/dg/textract-best-practices.html [3] https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-casehttps://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case

AWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
Rakesh
répondu il y a un an

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions