- Newest
- Most votes
- Most comments
UPDATE
Instance Maintenance Policy is now live! This allows you to set a Min Healthy Percentage on the ASG (or as part of your instance refresh). Setting MinHealthyPercent to 100% means the group will now launch replacement instances before terminating old ones during most replacement processes. There's also an accompanying Blog post for more details
Original answer below
Currently an instance refresh will always start terminating and launching the replacement instance at the same time, causing downtime in a single instance ASG: https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html#instance-refresh-limitations
There is an internal feature request for fully launching the new instance first before starting to terminate the old one, which I've added this post to as a +1
For now you'll need to set the desired to 2 first before running the instance refresh, or if the ASG is in a single AZ you can just lower the desired back down to 1 after the new instance is done launching for a similar outcome.
Hi,
Basically in the case of 1 instance, we can't really use the refresh instance mechanism if we want zero downtime. What I found people doing is what Shahad suggested. I.e. : determine that you are running only 1 instance -> increase the desired capacity --> decrease it back. The problem that I'm having with this solution is the timing. In my case, It's a lambda that triggers the Instance refresh after a new AMI is available. So, in the case of 1 running instance, that lambda could indeed increase the desired capacity to 2, but does not it have to wait until that instance is fully available before it resets the desired capacity back to 1 on the ASG otherwise we are back to the dreaded down time....? I could force the lambda to wait for that time with a loop checking on the status, but that's everything but elegant not to mention the billing time that, that incurs... Any ideas? Any event we could listen for....
I have read about the 'create_before_destroy' terraform solution which basically duplicate the whole ASG to ensure high availability. What do you guys think about it?
Relevant content
- asked 4 years ago
- Accepted Answerasked 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 months ago