Textract performance degradation

0

Hello, I need to OCR huge pdf documents as fast as possible due to SLA. My first short was to use SYNC mode. I was not able to get more than couple of pages per second and AWS support declined all requests to increase quota. So I decided to use ASYNC mode. It works more or less ok most of time - I am able to ocr 200 pages in less than 2 minutes. But around 15:00 by UTC almost every day I get huge performance degradation - sometimes it takes 120 sec to OCR single page. Textract task is just get stuck in IN_PROGRESS state. Any ideas or suggestions?

demandé il y a 2 ans1524 vues
2 réponses
0

Hi, Thank you for using Textract and I'm sorry to hear you're facing performance issues. If you're comfortable can you share sample job IDs and the region where you're facing these issues? You can also reach out via AWS Support to share these details.

AWS
répondu il y a 2 ans
  • And as well, I found this: "underestimate because of the timed-out jobs. If you want to build a real-time, customer-facing product with PDF inputs, AWS Textract is not the tool for you. Accuracy and speed results. Double asterisks indicate the best result for each measure" Above was the answer I got from AWS support when I reported about performance. Is it true?

0

I expect the regular pattern you're seeing to the latency probably corresponds to changing overall demand on the service in the region.

Therefore I'd maybe suggest you try routing documents to a different AWS Region during these problem periods, if possible? Probably some testing would be needed to find ideal schedules & regions - but as a first guess I'd explore regions in significantly different timezones and those with high default quotas.

It's worth mentioning that for async APIs the performance characteristics for very short documents should be dominated by queuing/overheads anyway: So it's probably not that useful to compare the per-page processing time of a 200-page doc and a 1-page doc.

AWS
EXPERT
Alex_T
répondu il y a 2 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions