How do I get the total number of textract jobs running

0

I am processing thousands of documents using textract. But since textract has a limit of running 100 jobs in parallel, I want to make sure that only 100 documents are sent to textract at any particular time. However looking at the api, I am not able to find anything useful which would return me the total number of textract jobs running currently. Please let me know if there is a way around this. Thanks

munna
demandé il y a 10 mois287 vues
1 réponse
0

Once you set a quota for the number of parallel jobs, the next job that goes over quota will be throttled and you need to handle this in your submission process (e.g. re- submission after a back off period). There is a CloudWatch metric as well that shows you the number of throttled requests.

But if you want to keep a real-time view on the jobs, you need to keep track of the jobs in a repository such as DynamoDB to know precisely how many jobs are running in parallel at any given time. This needs a bit of more coding and potential use of AWS Step Functions for jobs submission to orchestrate.

AWS
répondu il y a 10 mois
  • Thank you for the response, Behrang. That helps. However the proposed workaround would involve a bunch of work just to get the total jobs that are currently running. It is a surprise though that textract do not provide list_jobs() function like other aws services.

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions