Textract does not recognize image text

0

I want to get the text from an image but Textract doesn't read anything. I have tried with similar images and if it reads the information. Would you know what could be the problem for which it does not read the image? Textract Result Image

질문됨 일 년 전643회 조회
2개 답변
1

Sometimes in general while doing OCR (Optical Character Recognition) you need to pre-process the image, especially when you have some characters that not very defined.

In your case:

  1. I would crop the image to "focus" on the non-white part
  2. tilt the image in order to be as vertical as possible
  3. increase the contrast by 20-30%

Therefore, as you can see from my result, it recognize everything :)

You could use a Lambda or the SageMaker Processing Python job with OpenCV or similar product in order to pre-process the images. Usually this is a best practice in order to obtain better results. Please check this blog: https://aws.amazon.com/it/blogs/machine-learning/process-text-and-images-in-pdf-documents-with-amazon-textract/ or this notebook: https://github.com/aws-samples/textract-visual-removal/blob/main/visual_removal_canny_edge_detector.ipynb

Modified image

the output in Textract: Textract Tables output

[if my response was useful, please click the "Accept" tick]

profile pictureAWS
답변함 일 년 전
0

For This use case Consider using Amazon Rekognition - > https://docs.aws.amazon.com/rekognition/latest/dg/text-detection.html .

Sanjay
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠