Issue with Instance Refresh in Auto Scaling Group and CodeDeploy: Continuous Loop with Corrupted AMI

0

Does anyone have experience with Instance Refresh in an Auto Scaling group working with CodeDeploy? I have an issue related to updating the AMI/launch template in an Auto Scaling group using Instance Refresh and installing the latest version of the application on a new EC2 instance. Everything works fine if there are no errors (AMI errors) and the CodeDeploy deployment succeeds. The problem arises when the new AMI is corrupted, and CodeDeploy is unable to complete the deployment, resulting in an error, for example, during the 'AfterInstallation' phase.

In such a situation, the Auto Scaling group enters an infinite loop, terminating the EC2 instances on which the new application failed to deploy and creating new instances with the same corrupted AMI, repeating the entire situation in a loop. The Instance Refresh does not respond to this; it always has 0% progress. In this situation, how does AWS know whether Instance Refresh should perform a rollback or CodeDeploy should handle it? And why does it keep launching and terminating instances with the same AMI?

1 Answer
0

How long is the loop going for? AutoScaling should fail the refresh after 1 hour of no progress happening (ie, being stuck at 0%)

AutoScaling has no way to know if the failure is a 1-off or not, so it will keep retrying until its timeout is hit, at that point, if you had configured a new Desired Configuration and enabled rollbacks when setting up the Instance Refresh, it will go back to the previous launch template version which (should) successfully launch an instance to get your group back stable.

You also have an option to cancel the Refresh if something is going wrong to speed up the failure

AWS
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions