AWS batch Fargate jobs stuck on runnable state

0

I have an AWS batch job queue, on Fargate_spot, that I'm running jobs on. It has been working without issue until this morning. Now, the jobs are all stuck at the RUNNABLE status. I have attempted to remove all pending jobs, and submit just a single job with minimal resource requirements, but this still stalls on RUNNABLE.

I have confirmed:

  • I am not currently exhausting my Fargate vCPU quotas
  • I have tried running the job with less resources, e.g. 2 vCPU's and 8GB memory.
  • The compute environment that they are running on, has a Valid status, and the state is Enabled.
  • The compute environment "Maximum vCPUs" is not maxed out.
  • No error logs in cloudwatch.

The AWS zone is Ohio.

What could be causing this issue? Is it possible all Fargate spot instances are currently in use? And if so, is there any way to detect this?

Thanks

Adam
asked 3 months ago161 views
1 Answer
1
Accepted Answer

The possible cause of the issue due to 1.AWS Fargate Spot instances are generally available, there can be times when spot capacity is limited or unavailable in specific regions 2.you might want to try running a small job on Fargate On-Demand to see if it executes as expected. 3.Double-check the service limits, not just for vCPUs, but also for other resources and limits related to AWS Batch and ECS 4.Verify that the necessary permissions and IAM roles are correctly configured.

profile picture
Jagan
answered 3 months ago
  • This was the issue. I ended up creating two compute environments. The first one uses spot provisioning, and the second uses on-demand. This way, when spot is not available, it falls back to the on-demand.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions