1 Answer
- Newest
- Most votes
- Most comments
1
Hi,
To solve this issue, we created a watchdog Lambda scheduled via cron every 5 minutes. For this scheduling, see https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-run-lambda-schedule.html
This Lambda with list all jobs in Running state and kill those that have been running for too long.
To list jobs, get their details and cancel them, see:
- https://awscli.amazonaws.com/v2/documentation/api/latest/reference/batch/list-jobs.html
- https://awscli.amazonaws.com/v2/documentation/api/latest/reference/batch/describe-jobs.html
- https://awscli.amazonaws.com/v2/documentation/api/latest/reference/batch/cancel-job.html
Use the equivalent of this CLI command in your favorite language via corresponding SDK
Best,
Didier
Relevant content
- asked 6 years ago
- asked 3 years ago
- asked a year ago
- asked 3 years ago
- AWS OFFICIALUpdated 2 years ago
Via a support case, transition of jobs because of "CAPACITY:INSUFFICIENT_INSTANCE_CAPACITY" or "MISCONFIGURATION:COMPUTE_ENVIRONMENT_MAX_RESOURCE" only applies to EC2 jobs, not Fargate. This would require action on the AWS side to resolve. In the meantime, Didier's answers remains the most viable way to detect this situation.