Sort and extract full text

0

I need to extract text from some .PDFs documents.

The formatting of these texts varies (column where it starts and ends) but the beginning and end are always similar, it starts with the word DECRETO and ends with the word CHEFE + something

In this link you can see an example of the original text: https://i.postimg.cc/Xvk0TXJ9/texto.png

Is it possible to do this with AWS tools? What's the best way?

OBS: Text language is PT-BR

2개 답변
0

Hi there, thank you for using Textract. At the moment, we do not provide mechanism to support your use case directly, though we recommend that it is achieved on client side by doing some post processing based on the bounding boxes of lines returned in response. I hope this helps!

AWS
답변함 2년 전
  • Could Rekognition help by identifying each column and after doing some client-side processing, leave the texts in sequence and use Textract? With Rekognition, will I be able to identify each column separately?

-1

Yes, this is possible with Amazon Textract (which supports Portuguese). To learn more how to extract text from PDFs, you can check out the documentation.

AWS
Heiko
답변함 2년 전
  • With Textract it is not possible because it extracts and aligns the words per line and not the columns as I selected in the image.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠