Textract TextType recogntion in DetectDocumentText

0

Hi.

Is the DetectDocumentText job something that should recognize for each WORD if it is HANDWRITING or PRINTED? I noticed the following behavior with different documents I used:

  • If document is mostly "PRINTED" but has some handwritten text in it, every WORD is detected as PRINTED
  • If document is mostly "HANDWRITING" but has also some PRINTED lines/words in it, it will return PRINTED for each detected WORD
  • If document is fully "HANDWRITING", each WORD is detected as HANDWRITING

What I want to be able to do is to detect if and which WORDs are handwritten in the document. Is this possible using just DetectDocumentText? Note: I use the async task cause I mostly work with multipage pdf files.

Elvar

Elvar
질문됨 10달 전213회 조회
1개 답변
0

Hi,

Thanks for using AWS Textract service.

I understand that you would like to know if it’s possible to detect if and which WORDs are handwritten in the document using DetectDocumentText.

As per the documentation at https://docs.aws.amazon.com/textract/latest/dg/API_Block.html, it is supposed to be correct and specific per word. I would like to inform you that I have tested this at my end, and It is PRINTED or HANDWRITING per word.

As a machine learning service, Amazon Textract may not be able to achieve desired accuracy on certain documents.

The Textract model may not be working for your use case. However, please note that the Textract team continuously updates the model to improve the quality and include more use cases for better accuracy.

I would also suggest to reach out to AWS Support[1] (Textract), along with your issue/use case in detail and sample documents. We will troubleshoot accordingly by including the service team for suggestions on your exact requirement.

References:

[1] Please open a support case with AWS using the link:

https://console.aws.amazon.com/support/home?#/case/create

[+] Multiple use-cases for Hand written forms https://aws.amazon.com/blogs/machine-learning/extracting-handwritten-information-through-amazon-textract/

[+] Best practices https://docs.aws.amazon.com/textract/latest/dg/textract-best-practices.html

AWS
답변함 10달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠