[Announcement] Amazon Textract adds synchronous support for single page PDF documents and support for PDF documents containing JPEG 2000 encoded images

1

Amazon Textract is a machine learning service that automatically extracts text, handwriting, and data from scanned documents and goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables.

Previously customers had to convert PDF documents to PNG or JPEG formats prior to calling Textract’s synchronous APIs - (DetectDocumentText, AnalyzeDocument, and AnalyzeExpense and AnalyzeID) in order to extract text and data from documents such as claim forms, invoices & receipts, contracts/agreements, ID documents, and application forms. Starting today, Amazon Textract removes that pre-processing step and supports single page PDF documents in synchronous operations so that customers can extract text and data from PDF documents without converting documents from PDF to PNG or JPEG.

Additionally, Amazon Textract now also supports processing of JPEG 2000 encoded images inside PDF documents. You can now extract text and data from JPEG 2000 encoded images within your PDF documents.

To get started, log into the Amazon Textract console to test out your PDF documents. To learn more about Textract capabilities, please visit the Amazon Textract website, developer guide, or resource page.

  • This is an announcement migrated from AWS Forums that does not require an answer

AWS
demandé il y a 2 ans168 vues
Aucune réponse

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions