Amazon Textract

0

We are using AWS Textract modules Analyse Document to extract data from enrollment forms which are in pdf and jpeg format. We observed that for online filled pdf forms, textract is giving incorrect response for single select radio buttons/check boxes. and not able to recognize few text fields.

Please find below the attached screenshot of online filled PDF form where single select checkbox is selected but textract is giving "Not Selected" in response.

Enter image description here

If we upload the same form in JPEG format, textract is able to recognize the single select radio/checkboxes., first name , last name as well. and giving correct response. Enter image description here

How can we fix this? Please help us on resolving these issues as it is not giving correct response..

질문됨 2달 전152회 조회
1개 답변
0

To fix this issue with Textract, ensure the following:

  • Ensure the document uses a language supported by Textract (English, Spanish, Italian, Portuguese, French, German). Accuracy may be lower for other languages.
  • Provide high quality images (150DPI or higher) in a format like PDF, JPEG or PNG. Converting or downsampling the image before analysis could impact results.
  • Single select radio buttons and checkboxes can sometimes be challenging for Textract to interpret correctly. You may need to do additional post-processing on the results to determine which option was actually selected.
  • If certain text fields are not being recognized, the font, size or layout of those fields may make them harder to extract. Try preprocessing the document or that portion of the document before analyzing to clean it up.
  • The Textract console provides bounding box information that can help validate extractions. You can also download the full JSON response for deeper analysis as needed.
profile picture
전문가
답변함 2달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠