Sort and extract full text

0

I need to extract text from some .PDFs documents.

The formatting of these texts varies (column where it starts and ends) but the beginning and end are always similar, it starts with the word DECRETO and ends with the word CHEFE + something

In this link you can see an example of the original text: https://i.postimg.cc/Xvk0TXJ9/texto.png

Is it possible to do this with AWS tools? What's the best way?

OBS: Text language is PT-BR

2 Risposte
0

Hi there, thank you for using Textract. At the moment, we do not provide mechanism to support your use case directly, though we recommend that it is achieved on client side by doing some post processing based on the bounding boxes of lines returned in response. I hope this helps!

AWS
con risposta 2 anni fa
  • Could Rekognition help by identifying each column and after doing some client-side processing, leave the texts in sequence and use Textract? With Rekognition, will I be able to identify each column separately?

-1

Yes, this is possible with Amazon Textract (which supports Portuguese). To learn more how to extract text from PDFs, you can check out the documentation.

AWS
Heiko
con risposta 2 anni fa
  • With Textract it is not possible because it extracts and aligns the words per line and not the columns as I selected in the image.

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande