Automatic bookmarking of PDF document, AWS Textract service

0

I was wondering whether your service can automatically recognise reports in a searchable PDF document, bookmark them according to date and tile then sort them in chronological order?

Dass
feita há 9 meses214 visualizações
1 Resposta
1

AWS Textract is a service that extracts text and data from scanned documents. It can extract the information from a PDF, but it doesn't have built-in functionality for recognizing specific report formats, bookmarking them, or sorting them. However, you can build this functionality using additional services

  • Use AWS Textract to extract all the text from the PDF. Textract can identify and extract text from scanned documents, and it provides the results in a structured format.
  • Analyze the extracted text to find the dates and titles of the reports. You can use Regex or a similar algorithm for this. This could be done using AWS Lambda.
  • PDF bookmarks can be created programmatically, but AWS doesn't offer a specific service for this. You would need to use a library or tool that supports this feature, such as PyPDF2 or PDFBox, within a Lambda function. -Sort the reports in chronological order. This could also be done in the Lambda function.
  • Finally, save the bookmarked and sorted PDF to a storage service such as Amazon S3.
profile picture
ESPECIALISTA
respondido há 9 meses

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas