AWS AutoScaling group created many more instances than the configured max capacity. Why?

0

I have an AWS AutoScaling group configured with min/desired/max 1/1/5

To this ASG, I've attached a dynamic scaling policy that targets a CPU utilization of 75%

In response to a bug in some code I deployed to EC2 last night, CPU utilization in the one running instance shot up to 100%. The ASG provisioned more instances in response, but because the same buggy code was being deployed to each, they each had high CPU utilization.

In this scenario, I expected that the ASG would increase capacity to 5 instances, then stop there. To my surprise however, the ASG kept provisioning more instances, increasing the max capacity by itself. After running this way for some time, it had provisioned a total of 31 instances.

I want to know why the ASG went over the max capacity I configured. The ASG documentation (https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scale-based-on-demand.html?icmpid=docs_ec2as_help_panel#as-how-scaling-policies-work) says this should not be possible.

ack_inc
已提问 2 个月前156 查看次数
1 回答
1
已接受的回答

There are only 2 situations[1] where the ASG can change its own max capacity, neither of which should apply to your situation. I suggest looking in cloudtrail for EventName=UpdateAutoScalingGroup to see if maybe there was a user or some sort of automated script which increased the max?

Also just to confirm: Did the Max actually get changed from 5 to 31, or were there 31 total instances, but the max was still set to 5? Its possible if there were frequent healthcheck replacements happening from the instance CPU being maxed out, that the ASG could have had more instances than the max. The Max is a boundary for the desired capacity, not the total instances. While an instance is being replaced due to healthcheck failures, it isn't going to contribute towards the groups capacity calculations, and therefore the total instance count can be over max temporarily

[1]

  1. Scheduled scaling: If a scheduled action is created, it can be set to change the min, max, and/or desired capacity of the ASG. You would have had to explicitly created this, so I doubt its applicable here
  2. PredictiveScaling when the MaxCapacityBehavior is set to IncreaseMaxCapacity. However, Predictive Scaling only changes the groups capacity based on the past 1-14 days of usage. It does not react to any real time changes to usage in the group like you described
AWS
已回答 2 个月前
profile picture
专家
已审核 2 个月前
  • Thanks for the answer Shahad. I'm unable to verify if MaxCapacity indeed changed from 5 to 31 since I never enabled monitoring, but what you described - healthcheck replacements happening due to CPU being maxed out - is most likely what happened.

  • If you don't see the max as 31 currently; and you don't see any UpdateAutoScalingGroup calls in CloudTrail lowering it back down; then its a pretty safe bet it didn't change. AutoScaling metrics are free, so I'd recommend enabling them now to make future troubleshooting simpler: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-metrics.html

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则