AWS Textract Analyze Document Demo Web interface producing better results than AnalyzeDocument Synchronous API using Boto 3 client.


I uploaded a sample document to the AWS Textract demo web interface via our in-house portal, which consumes AWS Textract via the synchronous AnalyzeDocument API, and the table results provided by the web interface are more accurate than the results provided by the API. For example, all the headers within the table were properly extracted within the interface, while some were missing when using the API on the same sample document. The Boto client was updated to the latest version, boto3 1.26.160, botocore 1.29.160 and the region used is us-east-1.

asked a year ago329 views
1 Answer

I noticed that while performing the operation via the console, a StartExpenseAnalysis API call was being made, which is an Asynchronous API call to analyze invoices and receipts. However, you’re making use of Synchronous operation - AnalyzeExpense.

Please allow me to inform you that Asynchronous workflow involves different pre-processing component than synchronous one to process PDFs and Images. Although we try to align them in terms of functionalities and behavior, we do see discrepancy sometimes. Thus, I’d suggest you to use the Asynchronous operations

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions