Query regarding optimization of AWS Textract data extraction and API Gateway timeout issue


Dear AWS Support Team,

We are currently facing issues with our data extraction process using AWS Textract, leading to a 504 error in our API Gateway due to processing times exceeding the hard limit of 29 seconds. Our objective is to extract data from a 2-3 page PDF file, process it in under 10 seconds, and display it in a specific format on our portal.

Here is the current process flow:

Customer uploads a PDF form on our portal. Upon clicking the "Continue" button, our portal triggers the Textract API to extract data from the PDF form. The extracted data is displayed on the next screen in a specific format for the customer's preview. Despite the trial version of Textract processing data in less than 2-3 seconds, we are encountering delays that surpass the acceptable timeframe.

We seek your guidance on the following:

Optimization techniques: Could you please provide best practices or optimization techniques for using Textract to expedite the data extraction process for 2-3 page PDF files? Troubleshooting the delay: Are there any common issues or bottlenecks that might cause the extraction process to take longer than expected? Performance enhancement: What additional measures or configurations can we apply within our system or Textract settings to ensure efficient and timely data extraction? We aim to resolve this issue promptly to provide a seamless experience for our customers. Your insights and guidance on optimizing the use of Textract to meet our performance requirements would be greatly appreciated.

Thank you for your assistance in this matter.

1 Answer

Hi, thank you for using Textract. I am assuming you're using our Async APIs as these are multi-page documents. To process multi page pdf within 10 seconds, I had recommend to use our sync APIs, which are comparatively faster than async APIs, by making a separate a call for each page of the of PDF. For more information please check out https://docs.aws.amazon.com/textract/latest/dg/what-is.html

answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions