Issue with Azure DevOps agents after AWS instance refresh in Auto Scaling Group

0

Hi everyone! I am new to AWS.

I have three EC2 instances running in an Auto Scaling Group. These instances are used for Azure DevOps agents and are provided from a specific AWS Service Catalog product. I have created a pipeline to check if there is a new version of this product. If yes, then a bash script runs where it gets the latest version and also executes the "aws-instance refresh" command in order to replace the existing EC2 agents with the new ones. The issue is that after the instance-refresh is completed, only two of the three EC2 instances have the Azure DevOps agent service running.

This is how I use the command:

aws autoscaling start-instance-refresh
--auto-scaling-group-name "$ASG_NAME"
--preferences '{"InstanceWarmup": 3000, "MinHealthyPercentage": 90}'
--region eu-west-1

2 Answers
0

The issue you're experiencing with your Azure DevOps agents after an AWS instance refresh in your Auto Scaling Group (ASG) could be related to a few factors. Let's explore some potential causes and solutions:

  1. Instance Refresh Process: When you start an instance refresh, Amazon EC2 Auto Scaling gradually replaces the instances in your Auto Scaling group. The process you're using specifies a MinHealthyPercentage of 90%, which means that at least 90% of your instances must remain healthy during the refresh. With only three instances, this setting might cause the refresh to replace only two instances, leaving one with the old configuration.

  2. InstanceWarmup Setting: You've set the InstanceWarmup to 3000 seconds (50 minutes). This is the time Auto Scaling waits after a new instance is launched before considering it ready. Ensure this time is sufficient for your Azure DevOps agent to start and become operational.

  3. Auto Scaling Group Configuration: Check if your ASG is configured to maintain exactly three instances. If it's set to a range (e.g., minimum 2, desired 3, maximum 3), it might not immediately launch a third instance after the refresh.

  4. Instance Configuration: Verify that the new AWS Service Catalog product version correctly includes the setup for the Azure DevOps agent. The issue might be in the product configuration rather than the refresh process.

To address this issue:

  1. Consider using an instance refresh with checkpoints. This allows you to pause the refresh at certain points to verify that your applications are working correctly.

  2. Adjust the MinHealthyPercentage to a lower value (e.g., 66%) to ensure all instances are refreshed.

  3. After the refresh, manually check the status of your instances and the Azure DevOps agent service. You might need to manually start the service on the instance where it's not running.

  4. If the issue persists, you may want to implement a custom health check in your Auto Scaling group that verifies the Azure DevOps agent is running before considering an instance healthy.

  5. Consider using lifecycle hooks in your Auto Scaling group. These can help ensure that your Azure DevOps agent is fully set up and running before the instance is put into service.

Remember, when making changes to your Auto Scaling group or starting an instance refresh, it's important to monitor the process closely, especially in production environments. You may want to test these changes in a non-production environment first to ensure they work as expected with your specific setup.
Sources
Use an instance refresh to update instances in an Auto Scaling group - Amazon EC2 Auto Scaling
Update an Auto Scaling group - Amazon EC2 Auto Scaling
RefreshPreferences - Amazon EC2 Auto Scaling
Auto Scaling groups - Amazon EC2 Auto Scaling

profile picture
answered a month ago
  • The automatic Agent answer is a bit iffy on some of the bullets for this one, FYI

0

Check the activity history of the ASG to see if it actually replaced all 3 instances. Its possible the default SkipMatching parameter made it so the ASG didn't replaced all instances if they're on different versions.

If all the instances were replaced, then the Instance Refresh itself worked correctly, and you'll want to look at logs inside the instance to see why the Agent didn't start up correctly. Might be some sort of concurrency issue with the old instances still running as the new ones are starting?

AWS
EXPERT
answered a month ago
  • Hi Shalad. I checked the ASG activity and everything went fine. It replaced all the instances. What I dont understand is that in other environments works perfectly (ACC) and not in TST. Sometimes If I stop the instance which the azure devops agent is not installed, then a new one will come up (because of the ASG) and THEN this instance is visible in the Azure devops agents pool! I ran the pipeline again today, because there was a new version, and the 1st instance didnt have the Agent installed, but the other three yes. The 1st instance replaced at 2024/10/08 19:10 (Agent service not installed) The 2nd instance replaced at 2024/10/08 19:30 (Agent service installed) The 3d instance replaced at 2024/10/08 19:50 (Agent service installed)

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions