How do I get the total number of textract jobs running

0

I am processing thousands of documents using textract. But since textract has a limit of running 100 jobs in parallel, I want to make sure that only 100 documents are sent to textract at any particular time. However looking at the api, I am not able to find anything useful which would return me the total number of textract jobs running currently. Please let me know if there is a way around this. Thanks

munna
asked 10 months ago251 views
1 Answer
0

Once you set a quota for the number of parallel jobs, the next job that goes over quota will be throttled and you need to handle this in your submission process (e.g. re- submission after a back off period). There is a CloudWatch metric as well that shows you the number of throttled requests.

But if you want to keep a real-time view on the jobs, you need to keep track of the jobs in a repository such as DynamoDB to know precisely how many jobs are running in parallel at any given time. This needs a bit of more coding and potential use of AWS Step Functions for jobs submission to orchestrate.

AWS
answered 10 months ago
  • Thank you for the response, Behrang. That helps. However the proposed workaround would involve a bunch of work just to get the total jobs that are currently running. It is a surprise though that textract do not provide list_jobs() function like other aws services.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions