Amazon Textract

0

We are using AWS Textract modules Analyse Document to extract data from enrollment forms which are in pdf and jpeg format. We observed that for online filled pdf forms, textract is giving incorrect response for single select radio buttons/check boxes. and not able to recognize few text fields.

Please find below the attached screenshot of online filled PDF form where single select checkbox is selected but textract is giving "Not Selected" in response.

Enter image description here

If we upload the same form in JPEG format, textract is able to recognize the single select radio/checkboxes., first name , last name as well. and giving correct response. Enter image description here

How can we fix this? Please help us on resolving these issues as it is not giving correct response..

質問済み 2ヶ月前151ビュー
1回答
0

To fix this issue with Textract, ensure the following:

  • Ensure the document uses a language supported by Textract (English, Spanish, Italian, Portuguese, French, German). Accuracy may be lower for other languages.
  • Provide high quality images (150DPI or higher) in a format like PDF, JPEG or PNG. Converting or downsampling the image before analysis could impact results.
  • Single select radio buttons and checkboxes can sometimes be challenging for Textract to interpret correctly. You may need to do additional post-processing on the results to determine which option was actually selected.
  • If certain text fields are not being recognized, the font, size or layout of those fields may make them harder to extract. Try preprocessing the document or that portion of the document before analyzing to clean it up.
  • The Textract console provides bounding box information that can help validate extractions. You can also download the full JSON response for deeper analysis as needed.
profile picture
エキスパート
回答済み 2ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ