AWS Textract detecting "0" as "O"

0

Hi Team,

I've been trying to extract data from a PDF document that is based on text. Most of the document extraction is working well except for a specific case where "0" is being returned as "O".

Example:

Actual Text - CMA CGM ARCTIC / 0TXCDE1MA

Extracted Text - CMA CGM ARCTIC / OTXCDE1MA

Is there something we can do to train textract to return to return this element as 0 instead of O.

preguntada hace 2 años584 visualizaciones
1 Respuesta
0

Thank you for using Textract. Since, Textract is a machine learning model, it may not always reach expected accuracy on certain documents. The Textract model may not be working for this use case. However, please note that the Textract team continuously updates the model to improve the quality and include more use cases for better accuracy. In order to help us improve the models for your documents, please open a customer support ticket and share your documents with which you are facing the issue to help us analyze this further. Additionally, please look out for announcements regarding our model quality updates that are announced on the AWS Textract public release channel.

Please note that Textract is a built-in Machine Learning model and it does not allow users to train or customize the model as per the use cases. Hence, we cannot train the Textract model to return required results since it uses an internal model which cannot be customized.

AWS
INGENIERO DE SOPORTE
respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas