Sort and extract full text

0

I need to extract text from some .PDFs documents.

The formatting of these texts varies (column where it starts and ends) but the beginning and end are always similar, it starts with the word DECRETO and ends with the word CHEFE + something

In this link you can see an example of the original text: https://i.postimg.cc/Xvk0TXJ9/texto.png

Is it possible to do this with AWS tools? What's the best way?

OBS: Text language is PT-BR

已提問 2 年前檢視次數 339 次
2 個答案
0

Hi there, thank you for using Textract. At the moment, we do not provide mechanism to support your use case directly, though we recommend that it is achieved on client side by doing some post processing based on the bounding boxes of lines returned in response. I hope this helps!

AWS
已回答 2 年前
  • Could Rekognition help by identifying each column and after doing some client-side processing, leave the texts in sequence and use Textract? With Rekognition, will I be able to identify each column separately?

-1

Yes, this is possible with Amazon Textract (which supports Portuguese). To learn more how to extract text from PDFs, you can check out the documentation.

AWS
Heiko
已回答 2 年前
  • With Textract it is not possible because it extracts and aligns the words per line and not the columns as I selected in the image.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南