How do I get the total number of textract jobs running

0

I am processing thousands of documents using textract. But since textract has a limit of running 100 jobs in parallel, I want to make sure that only 100 documents are sent to textract at any particular time. However looking at the api, I am not able to find anything useful which would return me the total number of textract jobs running currently. Please let me know if there is a way around this. Thanks

munna
已提问 10 个月前286 查看次数
1 回答
0

Once you set a quota for the number of parallel jobs, the next job that goes over quota will be throttled and you need to handle this in your submission process (e.g. re- submission after a back off period). There is a CloudWatch metric as well that shows you the number of throttled requests.

But if you want to keep a real-time view on the jobs, you need to keep track of the jobs in a repository such as DynamoDB to know precisely how many jobs are running in parallel at any given time. This needs a bit of more coding and potential use of AWS Step Functions for jobs submission to orchestrate.

AWS
已回答 10 个月前
  • Thank you for the response, Behrang. That helps. However the proposed workaround would involve a bunch of work just to get the total jobs that are currently running. It is a surprise though that textract do not provide list_jobs() function like other aws services.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则