Auto Scaling Group not scaling based on ECS desired task count
I have an EC2-backed ECS cluster which contains a ASG (using Cluster Auto Scaling) that is allowed to scale between 1 and 5 EC2 instances. There is also a service defined on this cluster which is also set to scale between 1 and 5 tasks with each task reserving almost the full resources of a single instance.
I have configured the service to scale it's desired task count depending on the size of various queues within an Amazon MQ instance which is all handled by CloudWatch alarms. The scaling of the desired task count works as expected but the ASG doesn't provision new EC2 instances to fit the amount of desired tasks unless I manually go in and change the desired capacity of the ASG. This means the new tasks never get deployed as ECS cant find any suitable instances to deploy them too.
I dont know if i'm missing something but all the doumentation I have found on ECS Auto Scaling Groups is that it should scale instances to fit the total resources requested by the desired amount of tasks.
If I manually increase the desired capacity in the ASG and add an additional task that gets deployed on that new instance then the CapacityProviderReservation
still remains at 100%. If I then remove that second task then after a while the ASG will scale in and remove the instance that no longer has any tasks running on it which is the expected behaviour.
Any pointers would be greatly appreciated.
As a side note this is all setup using the Python CDK.
Edit: Clarified that the ASG is currently using CAS (as far as I can tell) and added details about scaling in working as expected
Many thanks Tom
Hi Tom,
The ASG doesn't know about the ECS service and won't scale by itself. There are two common ways to setup scaling on it to make sure there are instances added to the cluster when needed
- Use the same metric for both the ASG and service so that they both get alarms triggered at the same time. This could potentially be subject to a race condition where the queue length metric triggers an alarm for the ASG or service, and the alarm ends up moving back to OK before the other one scales
- Use CAS (Cluster AutoScaling) to trigger the ASG based off the tasks running/pending in the cluster
Relevant questions
Why is Auto Scaling Group not taking Security Group from Launch Template?
Accepted Answerasked 3 months agoscale in protection setting in auto scaling group is ignored
asked 3 months agoforce auto scaling group to scale in by terminating k8s pods ungracefully
asked a year agoDoes using SPOT_CAPACITY _OPTIMIZED launch spot instances into an auto-scaling group in AWS Batch?
asked 5 days agoAWS batch does not scale down EC2 instances
asked 5 months agoAuto Scaling Group stuck in "Updating Capacity"
asked 2 years agoHow many Load Balancers of what schemes are actually required while creating an ECS cluster with AutoScaling Via Capacity Provider?
asked 5 months agoAuto Scaling Group not scaling based on ECS desired task count
asked a month agoALBRequestCountPerTarget auto-scaling metric not available on an ECS Service
asked a month agoECS services not scaling in (scale in protection is disabled)
asked 20 days ago
Hi Shahad,
Thanks for the response. I'm fairly certain that I'm currently using CAS. In the CDK I attached the auto scaling group to the ECS Cluster as a capacity provider which based of the link you provided and elsewhere should cause ECS to manage the EC2 scaling.
Interestingly I manually increased the desired capacity of the ASG to 2 and once my cluster scaled down to a single task the ASG scaled in the extra EC2 instance that was running so the scaling in seems to be working correctly but the scaling out isn't.
It is like it's not detecting tasks until they are at least pending which obviously it can't get to unless there is capacity available