Preamble:
We need to extract a unique ID from an image. The image consists of 8 numbers and a character, like so: 99999999C. This code is within a Spanish document in the form of a PNG file, the api used is detect-document-text.
The problem is as follows:
When we extract the text from the original PNG, we obtain 999999990; in other words, it converts the last character into a zero. Then, we took a snapshot of the original image and cut 2 cm from the right side, resulting in the correct output.
Afterward, when we cut those 2 cm from the original PNG document, we still get the same inaccurate 999999990 response.
What can we do to get a more accurate result? We have tried cutting the sides programmatically, adding DPI metadata, increasing contrast by 20-40%, grayscaling the doc and programmatically cutting the document to eliminate margins, but the same faulty data is still being extracted.
Edit: the doc is perfectly human readable
EDIT 2: DOCUMENT WITH ERROR(THE REAL ONE HAS NO CENSORSHIP BUT THE ERROR PERSISTS)
IMAGE WITH NO ERROR(the real one also has no censorship, they have the same quality):
i have updated the question with the censored images
also to note, the documents tested have more info with no censorship