AWS Textract detecting "0" as "O"

0

Hi Team,

I've been trying to extract data from a PDF document that is based on text. Most of the document extraction is working well except for a specific case where "0" is being returned as "O".

Example:

Actual Text - CMA CGM ARCTIC / 0TXCDE1MA

Extracted Text - CMA CGM ARCTIC / OTXCDE1MA

Is there something we can do to train textract to return to return this element as 0 instead of O.

demandé il y a 2 ans584 vues
1 réponse
0

Thank you for using Textract. Since, Textract is a machine learning model, it may not always reach expected accuracy on certain documents. The Textract model may not be working for this use case. However, please note that the Textract team continuously updates the model to improve the quality and include more use cases for better accuracy. In order to help us improve the models for your documents, please open a customer support ticket and share your documents with which you are facing the issue to help us analyze this further. Additionally, please look out for announcements regarding our model quality updates that are announced on the AWS Textract public release channel.

Please note that Textract is a built-in Machine Learning model and it does not allow users to train or customize the model as per the use cases. Hence, we cannot train the Textract model to return required results since it uses an internal model which cannot be customized.

AWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
répondu il y a 2 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions