Textract performance degradation

0

Hello, I need to OCR huge pdf documents as fast as possible due to SLA. My first short was to use SYNC mode. I was not able to get more than couple of pages per second and AWS support declined all requests to increase quota. So I decided to use ASYNC mode. It works more or less ok most of time - I am able to ocr 200 pages in less than 2 minutes. But around 15:00 by UTC almost every day I get huge performance degradation - sometimes it takes 120 sec to OCR single page. Textract task is just get stuck in IN_PROGRESS state. Any ideas or suggestions?

asked 2 years ago1438 views
2 Answers
0

Hi, Thank you for using Textract and I'm sorry to hear you're facing performance issues. If you're comfortable can you share sample job IDs and the region where you're facing these issues? You can also reach out via AWS Support to share these details.

AWS
answered 2 years ago
  • And as well, I found this: "underestimate because of the timed-out jobs. If you want to build a real-time, customer-facing product with PDF inputs, AWS Textract is not the tool for you. Accuracy and speed results. Double asterisks indicate the best result for each measure" Above was the answer I got from AWS support when I reported about performance. Is it true?

0

I expect the regular pattern you're seeing to the latency probably corresponds to changing overall demand on the service in the region.

Therefore I'd maybe suggest you try routing documents to a different AWS Region during these problem periods, if possible? Probably some testing would be needed to find ideal schedules & regions - but as a first guess I'd explore regions in significantly different timezones and those with high default quotas.

It's worth mentioning that for async APIs the performance characteristics for very short documents should be dominated by queuing/overheads anyway: So it's probably not that useful to compare the per-page processing time of a 200-page doc and a 1-page doc.

AWS
EXPERT
Alex_T
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions