AWS batch does not scale down EC2 instances

0

We have an AWS Batch compute environment that is set up to use EC2 On-demand instances.

Whenever we run a large number of tasks, Batch scales our group of instances up to the number of CPUs we require, but it never ends up scaling them back down, even when we have 0 Batch tasks running.

Looking at the autoscaling group that Batch manages, I see it has scale-in protection enabled. This means it doesn't terminate EC2 instances when the desired capacity is decreased.

So - looking at the autoscaling group's logs, it always scales up correctly, but scaling down fails because of the scale-in protection. For example:

Status: CANCELLED
Error: Could not scale to desired capacity because all remaining instances are protected from scale-in.
Cause: At 2022-02-02T13:11:53Z a user request update of AutoScalingGroup constraints to min: 0, max: 18, desired: 18 changing the desired capacity from 96 to provide the desired capacity of 18. At 2022-02-02T13:12:05Z group reached equilibrium.

My guess is this is why it isn't scaling down. But since this auto scaling group is managed by Batch and Batch should scale instances down relatively quickly, it doesn't make sense to me.

This is costing us a lot in unused EC2 time, so if anyone can offer any insights, that'd be great.

  • To clarify; are you using a managed compute environment? Did you use a launch template when setting up the compute environment? Termination protection can be set as part of your launch template.

1 Answer
0

Hey Jacques, It looks like your auto-scaling group is configured to protect instances from scale-in operations.

To change the instance scale-in protection setting for your auto-scaling group:

  1. Open the Amazon EC2 Auto Scaling console at https://console.aws.amazon.com/ec2autoscaling/.

  2. Select check box next to the Auto Scaling group.

  3. A split pane opens up in the bottom part of the Auto Scaling groups page, showing information about the group that's selected.

  4. On the Details tab, choose Advanced configurations, Edit.

  5. For Instance scale-in protection, de-select "Enable instance scale-in protection".

  6. Choose Update.

For more info, please visit Modify the instance scale-in protection setting for a group.

Note: When the instance scale-in protection setting is enabled, all current instances launched after enabling it will have instance scale-in protection enabled. You might have to disable termination protection on each instance.

To change the instance scale-in protection setting for the current instances:

  1. Open the Amazon EC2 Auto Scaling console at https://console.aws.amazon.com/ec2autoscaling/.

  2. Select the check box next to your Auto Scaling group.

  3. A split pane opens up in the bottom part of the Auto Scaling groups page, showing information about the group that's selected.

  4. On the Instance management tab, in Instances, select all of your instances.

  5. Choose Actions, Remove scale-in protection. When prompted, choose Remove scale-in protection.

For more info, please visit Modify the instance scale-in protection setting for an instance.

Please let me know if this works or if you have any further concerns and I will be happy to help!

profile pictureAWS
SUPPORT ENGINEER
answered 2 years ago
  • Thanks for the answer, Venkat!

    What's strange is that AWS Batch created the Auto Scaling group with scale-in protection enabled by default.

    It shouldn't do that, right?

    Is there maybe a config option for AWS Batch that I misconfigured that somehow caused it to enable scale-in protection by default, or could this be a bug with how Batch manages Auto Scaling groups?

  • Hey Jacques,

    When you create the compute environment with Managed scaling, the autoscaling group is created with "Protected from scale in" enabled by default. This is to ensure that only AWS Batch should be able to scale down the desired-count. You are not supposed to manually scale up/down the desired count.

    If you are using Managed scaling and the instances are not scaled down even when the task count is 0, please create a support case with AWS as the support team would have the visibility to explore your configurations and determine the best course of action.

    Hope this helps!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions