Automatic bookmarking of PDF document, AWS Textract service

0

I was wondering whether your service can automatically recognise reports in a searchable PDF document, bookmark them according to date and tile then sort them in chronological order?

Dass
質問済み 9ヶ月前214ビュー
1回答
1

AWS Textract is a service that extracts text and data from scanned documents. It can extract the information from a PDF, but it doesn't have built-in functionality for recognizing specific report formats, bookmarking them, or sorting them. However, you can build this functionality using additional services

  • Use AWS Textract to extract all the text from the PDF. Textract can identify and extract text from scanned documents, and it provides the results in a structured format.
  • Analyze the extracted text to find the dates and titles of the reports. You can use Regex or a similar algorithm for this. This could be done using AWS Lambda.
  • PDF bookmarks can be created programmatically, but AWS doesn't offer a specific service for this. You would need to use a library or tool that supports this feature, such as PyPDF2 or PDFBox, within a Lambda function. -Sort the reports in chronological order. This could also be done in the Lambda function.
  • Finally, save the bookmarked and sorted PDF to a storage service such as Amazon S3.
profile picture
エキスパート
回答済み 9ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ