AWS Textract detecting "0" as "O"

0

Hi Team,

I've been trying to extract data from a PDF document that is based on text. Most of the document extraction is working well except for a specific case where "0" is being returned as "O".

Example:

Actual Text - CMA CGM ARCTIC / 0TXCDE1MA

Extracted Text - CMA CGM ARCTIC / OTXCDE1MA

Is there something we can do to train textract to return to return this element as 0 instead of O.

질문됨 2년 전584회 조회
1개 답변
0

Thank you for using Textract. Since, Textract is a machine learning model, it may not always reach expected accuracy on certain documents. The Textract model may not be working for this use case. However, please note that the Textract team continuously updates the model to improve the quality and include more use cases for better accuracy. In order to help us improve the models for your documents, please open a customer support ticket and share your documents with which you are facing the issue to help us analyze this further. Additionally, please look out for announcements regarding our model quality updates that are announced on the AWS Textract public release channel.

Please note that Textract is a built-in Machine Learning model and it does not allow users to train or customize the model as per the use cases. Hence, we cannot train the Textract model to return required results since it uses an internal model which cannot be customized.

AWS
지원 엔지니어
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠