AWS Textract detecting "0" as "O"

0

Hi Team,

I've been trying to extract data from a PDF document that is based on text. Most of the document extraction is working well except for a specific case where "0" is being returned as "O".

Example:

Actual Text - CMA CGM ARCTIC / 0TXCDE1MA

Extracted Text - CMA CGM ARCTIC / OTXCDE1MA

Is there something we can do to train textract to return to return this element as 0 instead of O.

質問済み 2年前584ビュー
1回答
0

Thank you for using Textract. Since, Textract is a machine learning model, it may not always reach expected accuracy on certain documents. The Textract model may not be working for this use case. However, please note that the Textract team continuously updates the model to improve the quality and include more use cases for better accuracy. In order to help us improve the models for your documents, please open a customer support ticket and share your documents with which you are facing the issue to help us analyze this further. Additionally, please look out for announcements regarding our model quality updates that are announced on the AWS Textract public release channel.

Please note that Textract is a built-in Machine Learning model and it does not allow users to train or customize the model as per the use cases. Hence, we cannot train the Textract model to return required results since it uses an internal model which cannot be customized.

AWS
サポートエンジニア
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ