Why are so few instances running with my AWS Batch Compute Environment config?

0

I have an AWS Batch setup, with:

  • a Compute Environment of type MANAGED configured to have Min vCPUs = 0 - Max vCPUs = 512, and allocation strategy SPOT_CAPACITY_OPTIMIZED (max % on-demand price set to 100%)
  • a Job Definition targetting 16 vCPUs and Memory 16,384.

I often trigger hundreds of jobs at the same time, none of which apply any overwrites to this JD.

What I don't understand is why I hardly ever have more than about 26 ECS tasks running at the same time. I'm looking at it now and only seeing 13 m4.10xlarge instances running (40 vCPUs each, so presumably 2 tasks per instance == 26 tasks), whilst I still have more than 100 tasks in RUNNABLE state... Right now the Desired vCPUs is showing "480" (but not sure what that means).

I have another JD asking for 64 vCPUs and 65536 memory (for bigger tasks) and tried testing another "spot" CE with Max vCPUs 4096, thinking the larger number would get me more instances more quickly. Just the opposite in fact. I had just 6 R4.x16large instances running at the same time, and the whole thing took many hours

How can I configure things to use as many instances in parallel to clear the queue as quickly as possible? (In case it's relevant, I'm using eu-west-1)

asked 2 years ago705 views
1 Answer
0

You specify the upper limit for vCPUs in your compute environment. There is no guarantee that it can scale to that, though. Availability of instances in the AZs of the region you've selected, Service Limits or available IP addresses in your subnets could e.g. be a limit in your case.

Have you checked your account's Service Quotas? You will find limits on number of spot requests as well as number of running EC2 instances for instance families in your account. Please find information on how to access to your Service Quotas and how to increase them here: https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html

I take from your description, that you specify the instance family to be either m4 or r4. If your containers are flexible to run on multiple instance types, you should follow AWS' best practices for Spot and diversify the Compute Environment. AWS Batch e.g. offers the 'optimal' preset already, that will choose instances from the m- and c-family.

AWS
answered 2 years ago
  • Thanks for your response. I will have a look at the service limits. I'm pretty sure that a few weeks ago I was seeing much higher number of spot instances being used, and nothing was changed on the limit side.

    Is there any way to "see" availability of instances of any particular type in the AZs at any one time? (to allow me to compare what my account is using, and what's generally available "out there"?

    As for instance type, I am already using "optimal". I was just mentioning m4 and r4 as instance types I was seeing being used.

  • No, it is unfortunately not possible to see available capacity for AWS regions and AZs.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions