Amazon Textract extraction speed

0

Good morning, I have an authorized quota of 60 per second for the API "DetectDocumentText throttle limit in transaction per second", we make the connection and everything works fine, the text extraction is correct, but the processing speed is very slow, you are using quota between 10 to 17 per minute when it should be with a quota utilization between 3000 and 3600 per minute, I want to know what else we can review to increase capacity because we need to process 5,000,000 pages per day and at this rate we will never end . The instance where it is connected with the texttract API is a C4.8xlarge, the S3 storage. I am very attentive to your comments, thank you very much.

CDT
asked a year ago603 views
2 Answers
1
  1. You might evaluate if your size of your picture, therefore quality is bigger than you really need so it's slowing your process. I would try to down-sample the pictures and see if the accuracy does not decrease but it increase potentially the processing time.
  2. I'm not sure I understood your architecture flow fully, but I advice you to look into this sample where you can see how to handle concurrent Textract requests and also the queue https://github.com/aws-samples/amazon-textract-serverless-large-scale-document-processing
profile pictureAWS
answered a year ago
  • If my response helped, please consider accept it so it can help others, thanks!

0

Please reach out to your local AWS account team and/or Solutions Architect. Given the scale that you're operating at there are likely a few different conversations that need to be had.

profile pictureAWS
EXPERT
answered a year ago
  • Ok, very thank you!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions