AWS Textract detecting "0" as "O"

0

Hi Team,

I've been trying to extract data from a PDF document that is based on text. Most of the document extraction is working well except for a specific case where "0" is being returned as "O".

Example:

Actual Text - CMA CGM ARCTIC / 0TXCDE1MA

Extracted Text - CMA CGM ARCTIC / OTXCDE1MA

Is there something we can do to train textract to return to return this element as 0 instead of O.

已提問 2 年前檢視次數 584 次
1 個回答
0

Thank you for using Textract. Since, Textract is a machine learning model, it may not always reach expected accuracy on certain documents. The Textract model may not be working for this use case. However, please note that the Textract team continuously updates the model to improve the quality and include more use cases for better accuracy. In order to help us improve the models for your documents, please open a customer support ticket and share your documents with which you are facing the issue to help us analyze this further. Additionally, please look out for announcements regarding our model quality updates that are announced on the AWS Textract public release channel.

Please note that Textract is a built-in Machine Learning model and it does not allow users to train or customize the model as per the use cases. Hence, we cannot train the Textract model to return required results since it uses an internal model which cannot be customized.

AWS
支援工程師
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南