Textract performance degradation

0

Hello, I need to OCR huge pdf documents as fast as possible due to SLA. My first short was to use SYNC mode. I was not able to get more than couple of pages per second and AWS support declined all requests to increase quota. So I decided to use ASYNC mode. It works more or less ok most of time - I am able to ocr 200 pages in less than 2 minutes. But around 15:00 by UTC almost every day I get huge performance degradation - sometimes it takes 120 sec to OCR single page. Textract task is just get stuck in IN_PROGRESS state. Any ideas or suggestions?

질문됨 2년 전1524회 조회
2개 답변
0

Hi, Thank you for using Textract and I'm sorry to hear you're facing performance issues. If you're comfortable can you share sample job IDs and the region where you're facing these issues? You can also reach out via AWS Support to share these details.

AWS
답변함 2년 전
  • And as well, I found this: "underestimate because of the timed-out jobs. If you want to build a real-time, customer-facing product with PDF inputs, AWS Textract is not the tool for you. Accuracy and speed results. Double asterisks indicate the best result for each measure" Above was the answer I got from AWS support when I reported about performance. Is it true?

0

I expect the regular pattern you're seeing to the latency probably corresponds to changing overall demand on the service in the region.

Therefore I'd maybe suggest you try routing documents to a different AWS Region during these problem periods, if possible? Probably some testing would be needed to find ideal schedules & regions - but as a first guess I'd explore regions in significantly different timezones and those with high default quotas.

It's worth mentioning that for async APIs the performance characteristics for very short documents should be dominated by queuing/overheads anyway: So it's probably not that useful to compare the per-page processing time of a 200-page doc and a 1-page doc.

AWS
전문가
Alex_T
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠