Textract TextType recogntion in DetectDocumentText

0

Hi.

Is the DetectDocumentText job something that should recognize for each WORD if it is HANDWRITING or PRINTED? I noticed the following behavior with different documents I used:

  • If document is mostly "PRINTED" but has some handwritten text in it, every WORD is detected as PRINTED
  • If document is mostly "HANDWRITING" but has also some PRINTED lines/words in it, it will return PRINTED for each detected WORD
  • If document is fully "HANDWRITING", each WORD is detected as HANDWRITING

What I want to be able to do is to detect if and which WORDs are handwritten in the document. Is this possible using just DetectDocumentText? Note: I use the async task cause I mostly work with multipage pdf files.

Elvar

Elvar
asked 10 months ago206 views
1 Answer
0

Hi,

Thanks for using AWS Textract service.

I understand that you would like to know if it’s possible to detect if and which WORDs are handwritten in the document using DetectDocumentText.

As per the documentation at https://docs.aws.amazon.com/textract/latest/dg/API_Block.html, it is supposed to be correct and specific per word. I would like to inform you that I have tested this at my end, and It is PRINTED or HANDWRITING per word.

As a machine learning service, Amazon Textract may not be able to achieve desired accuracy on certain documents.

The Textract model may not be working for your use case. However, please note that the Textract team continuously updates the model to improve the quality and include more use cases for better accuracy.

I would also suggest to reach out to AWS Support[1] (Textract), along with your issue/use case in detail and sample documents. We will troubleshoot accordingly by including the service team for suggestions on your exact requirement.

References:

[1] Please open a support case with AWS using the link:

https://console.aws.amazon.com/support/home?#/case/create

[+] Multiple use-cases for Hand written forms https://aws.amazon.com/blogs/machine-learning/extracting-handwritten-information-through-amazon-textract/

[+] Best practices https://docs.aws.amazon.com/textract/latest/dg/textract-best-practices.html

AWS
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions