1 Answer
- Newest
- Most votes
- Most comments
0
- What kind of rebalancing is happening (it should say in the activity history message). It sounds like this is from Spot rebalance notifications? Or is there also AZ Rebalancing happening?
- What allocation strategy are you using? You should not use lowest-price with capacity rebalancing enabled, since the rebalance launch can end up going right back into a low capacity pool. We recommend using price-capacity-optimized for most ASG usecases, as this strategy intelligently balances both price and capacity of each instance type when launching.
- How many instance types are in the ASG? Are you using attributes to define them, or an explicit list? When using spot, we recommend having 10 instance types at a minimum (with of course more being better to reduce capacity related issues). If you're using Attributes to define your list, make sure you've included the Price Protection attributes to include all matching instance types
- Is there a MaxPrice set? This increases changes of interruption, since only instance types
When the rebalance launch is attempted, the ASG will first try to launch instances into the same AZ as the termination is going to happen in (to maintain AZ balance). So if the instance which received the rebalance notification is in AZ 1a, the replacement will also be launched in 1a if there's available spot capacity. The ASG will only fail over to try one of your other AZs if EC2 doesn't return capacity in the first one being attempted.
answered 6 months ago
Relevant content
- asked 10 months ago
- Accepted Answerasked 8 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 4 months ago
Hi Answering the qeustions:
aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name 'gitlab-runner' --output text | grep -i "*startegy*"
and it just returns nothing.Thanks for that additional info! Based on that, what's likely happening is:
While EC2 and ASG do try and prevent behaviors like this, the feature is designed around having multiple instance types in an ASG for spot to move between: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-capacity-rebalancing.html#capacity-rebalancing-behavior
I would recommend you look into adding multiple instance types into the ASG for use with spot. If that's not possible for your workload, then disabling Capacity Rebalance will reduce the churn you're seeing; but it means spot instances will be interrupted without any proactive action from the ASG