Amazon Textract

0

We are using AWS Textract modules Analyse Document to extract data from enrollment forms which are in pdf and jpeg format. We observed that for online filled pdf forms, textract is giving incorrect response for single select radio buttons/check boxes. and not able to recognize few text fields.

Please find below the attached screenshot of online filled PDF form where single select checkbox is selected but textract is giving "Not Selected" in response.

Enter image description here

If we upload the same form in JPEG format, textract is able to recognize the single select radio/checkboxes., first name , last name as well. and giving correct response. Enter image description here

How can we fix this? Please help us on resolving these issues as it is not giving correct response..

asked 2 months ago138 views
1 Answer
0

To fix this issue with Textract, ensure the following:

  • Ensure the document uses a language supported by Textract (English, Spanish, Italian, Portuguese, French, German). Accuracy may be lower for other languages.
  • Provide high quality images (150DPI or higher) in a format like PDF, JPEG or PNG. Converting or downsampling the image before analysis could impact results.
  • Single select radio buttons and checkboxes can sometimes be challenging for Textract to interpret correctly. You may need to do additional post-processing on the results to determine which option was actually selected.
  • If certain text fields are not being recognized, the font, size or layout of those fields may make them harder to extract. Try preprocessing the document or that portion of the document before analyzing to clean it up.
  • The Textract console provides bounding box information that can help validate extractions. You can also download the full JSON response for deeper analysis as needed.
profile picture
EXPERT
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions