[Announcement] Amazon Textract adds synchronous support for single page PDF documents and support for PDF documents containing JPEG 2000 encoded images

1

Amazon Textract is a machine learning service that automatically extracts text, handwriting, and data from scanned documents and goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables.

Previously customers had to convert PDF documents to PNG or JPEG formats prior to calling Textract’s synchronous APIs - (DetectDocumentText, AnalyzeDocument, and AnalyzeExpense and AnalyzeID) in order to extract text and data from documents such as claim forms, invoices & receipts, contracts/agreements, ID documents, and application forms. Starting today, Amazon Textract removes that pre-processing step and supports single page PDF documents in synchronous operations so that customers can extract text and data from PDF documents without converting documents from PDF to PNG or JPEG.

Additionally, Amazon Textract now also supports processing of JPEG 2000 encoded images inside PDF documents. You can now extract text and data from JPEG 2000 encoded images within your PDF documents.

To get started, log into the Amazon Textract console to test out your PDF documents. To learn more about Textract capabilities, please visit the Amazon Textract website, developer guide, or resource page.

  • This is an announcement migrated from AWS Forums that does not require an answer

AWS
已提問 2 年前檢視次數 168 次
沒有答案

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南