Amazon Textract

0

We are using AWS Textract modules Analyse Document to extract data from enrollment forms which are in pdf and jpeg format. We observed that for online filled pdf forms, textract is giving incorrect response for single select radio buttons/check boxes. and not able to recognize few text fields.

Please find below the attached screenshot of online filled PDF form where single select checkbox is selected but textract is giving "Not Selected" in response.

Enter image description here

If we upload the same form in JPEG format, textract is able to recognize the single select radio/checkboxes., first name , last name as well. and giving correct response. Enter image description here

How can we fix this? Please help us on resolving these issues as it is not giving correct response..

已提問 2 個月前檢視次數 152 次
1 個回答
0

To fix this issue with Textract, ensure the following:

  • Ensure the document uses a language supported by Textract (English, Spanish, Italian, Portuguese, French, German). Accuracy may be lower for other languages.
  • Provide high quality images (150DPI or higher) in a format like PDF, JPEG or PNG. Converting or downsampling the image before analysis could impact results.
  • Single select radio buttons and checkboxes can sometimes be challenging for Textract to interpret correctly. You may need to do additional post-processing on the results to determine which option was actually selected.
  • If certain text fields are not being recognized, the font, size or layout of those fields may make them harder to extract. Try preprocessing the document or that portion of the document before analyzing to clean it up.
  • The Textract console provides bounding box information that can help validate extractions. You can also download the full JSON response for deeper analysis as needed.
profile picture
專家
已回答 2 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南