Autoscaling scheduled actions triggered successfully but did nothing



We try to understand a strange behavior on EC2 autoscaling,

We have an autoscaling group that can scale from 3 to 60 machines, we have configured a target tracking scaling policy on CPU reservation set to 95% + three scheduled scaling actions that increase the desired capacity at certain times of the day (we have workloads that suddenly increase at specific times).

So usually everything works fine, our cluster grows or shrinks depending on the workload during the day and the scheduled actions prepare the cluster to absorb massive workloads at specific times.

But we encountered a behavior that we cannot explain. As you can see here, we have an action scheduled at 2am that increases the desired capacity to 20, and 15 minutes later another that increases it to 60. Enter image description here

We can see that these actions were triggered in the autoscaling group logs here : Enter image description here

But the cluster never really grew, we can see on this graph that the desired capacity remained at 3 even though the autoscaling group tells us that the task of increasing the cluster was triggered. Enter image description here

We had to manually remove the three running instances and set the desired capacity to 60 to unlock the autoscaling. Now everything works fine and the scheduled actions can properly change the desired capacity again...

Is there anything that could explain this behavior? Why did the scheduled actions do nothing when we can see them triggered in the autoscaling group logs? And why removing the remaining instances + set the desired capacity to maximum unlocked everything ?


asked 2 months ago256 views
1 Answer
Accepted Answer

There was likely a conflict between the low usage alarm of the target tracking policy and the scheduled scaling action. To prevent this, increase the MinSize of the ASG instead of the desired. That way the scaling policy can't scale-in while the usage is still low before your traffic spike hits. You'll then want a 4th scheduled action a little while later (any time after your traffic spike will have started) that lowers the Min back down to 3. This way the target tracking policy can still gracefully scale-in the ASG when the utilization goes down. You could optionally also still keep the scheduled action that sets Desired to 10.

As a side note: It sounds like you're using ECS. I'd recommend looking into CapacityProviders, which have a special metric designed for scaling ASGs that are used for ECS clusters which you can replace your current target tracking policy with:

answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions