How do I get the total number of textract jobs running

0

I am processing thousands of documents using textract. But since textract has a limit of running 100 jobs in parallel, I want to make sure that only 100 documents are sent to textract at any particular time. However looking at the api, I am not able to find anything useful which would return me the total number of textract jobs running currently. Please let me know if there is a way around this. Thanks

munna
gefragt vor 10 Monaten287 Aufrufe
1 Antwort
0

Once you set a quota for the number of parallel jobs, the next job that goes over quota will be throttled and you need to handle this in your submission process (e.g. re- submission after a back off period). There is a CloudWatch metric as well that shows you the number of throttled requests.

But if you want to keep a real-time view on the jobs, you need to keep track of the jobs in a repository such as DynamoDB to know precisely how many jobs are running in parallel at any given time. This needs a bit of more coding and potential use of AWS Step Functions for jobs submission to orchestrate.

AWS
beantwortet vor 10 Monaten
  • Thank you for the response, Behrang. That helps. However the proposed workaround would involve a bunch of work just to get the total jobs that are currently running. It is a surprise though that textract do not provide list_jobs() function like other aws services.

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen