Skip to content

Issue while scale-out event in auto scaling group integrated with code deploy

0

Hello, I have a code deploy application and deployment group configured, with a auto scaling group. When I increase the desired instance count in auto scaling group, I can see an auto-scaling group action in code deploy which deploys my code on the instance, I also validated the instance Id from auto-scaling activities tab, but after a minute or two, the same instance gets terminated, and this is the message in auto-scaling activities tab: "an instance was taken out of service in response to a launch failure". I checked the state transition reason in the terminated instance, it shows it is User initiated. I confirmed that the code deploy agent is installed and running on the newly created instance. I checked the IAM role, it has access to S3 buckets where revision is stored. I also checked the security group and outbound rules for the instance and it allows https. I also checked the availability region of my sub net and the auto-scaling group, both are set for the same regions. What am I missing?

2 Answers
0

What's the cause listed on the launch failure? Does it say "Due to ABANDON"? If so, then the Deployment in CodeDeploy is failing, and CodeDeploy is telling ASG to fail the launch (by sending a complete-lifecycle-action call with ABANDON as the result). Check the Deployment to see which step its failing at. For more details, you'll need to get the deployment logs from inside the instance.

If you're not automatically pushing these out, then you'll need to save the instance before its terminated, or set its EBS volume to not be deleted on instance termination. The simplest option is to quickly send a complete-lifecycle-action call with CONTINUE, and then detach the instance from the ASG. The Deployment in CodeDeploy will still continue, but it'll be a standalone EC2 instance, so it won't be terminated when the deployment fails, and you can take look in the logs to see what happened

AWS
EXPERT
answered 13 days ago
  • Hello, there are a few different things that are happening, when ASG is sending scaling group action to code deploy.

    1. ASG sends termination signal even before reaching the script, the applicationStop life cycle started at 12:02 and took less than a second, but the next lifecycle which is DownloadBundle started at 12:07 during which the termination event arrived. code deploy did not reach my script
    2. ASG sends termination event when the script is running In both cases my script did not complete.
  • What are the full "Cause" and "Description" fields? (redacting out any sensitive info)
    These should give more info on why the launch failed

  • During the download bundle failure: CodeDeploy agent was not able to receive the lifecycle event. Check the CodeDeploy agent logs on your host and make sure the agent is running and can connect to the CodeDeploy server.

    and during the afterInstall failure: [stderr] [stderr]Session terminated, killing shell... ...killed.

    there were some more errors: during AllowTraffic: 1.The EC2 instance did not turn into the expected healthy state in target group. In "ASG_GROUP": {State: unused,Reason: Target.NotRegistered,Description: Target is not registered to the target group} timed out when trying to achieve ELB registration goal. 2. The following targets are not in a running state and cannot be registered: 'INSTANCE_ID'

  • CodeDeploy agent was not able to receive the lifecycle event

    This sounds like the main issue. If the Agent didn't connect correctly, it wouldn't be able to send a heartbeat to the ASG to extend the hook timeout. Make sure the most up to date version of the Agent is installed. Also make sure the instance has outbound connectivity.

    If that doesn't fix it, you might want to open a support case so someone can take a look at your actual resources to troubleshoot in more detail

0

Based on the information provided, it appears you're experiencing an issue with your Auto Scaling group where new instances are being terminated shortly after launch during a scale-out event. This problem can occur for several reasons, and here are some potential causes and solutions to consider:

  1. CodeDeploy Deployment Failure: It's possible that the CodeDeploy deployment is failing on the new instance, causing the Auto Scaling group to mark it as unhealthy and terminate it. Check the CodeDeploy logs on the instance (if possible before termination) or in CloudWatch Logs for any deployment errors.

  2. Instance Health Checks: Ensure that your Auto Scaling group's health check grace period is long enough to allow the CodeDeploy deployment to complete. If the grace period is too short, the instance might be terminated before the deployment finishes.

  3. Application Startup Issues: If your application fails to start properly after deployment, it could cause the instance to fail health checks. Review your application logs for any startup errors.

  4. IAM Permissions: While you mentioned checking IAM roles for S3 access, ensure that the instance role has all necessary permissions for both CodeDeploy and your application to function correctly.

  5. Network Configuration: Verify that the instance can communicate with all required services and endpoints. This includes CodeDeploy service endpoints, your application dependencies, and any other AWS services you're using.

  6. Resource Constraints: Check if there are any resource constraints (e.g., CPU, memory) that might be causing the application or CodeDeploy agent to fail.

  7. Lifecycle Hooks: If you're using Auto Scaling lifecycle hooks, ensure they are configured correctly and not causing unexpected behavior.

  8. CodeDeploy Agent Issues: Although you confirmed the agent is installed and running, check if it's the latest version and properly configured.

To troubleshoot further:

  1. Increase logging verbosity for CodeDeploy and your application.
  2. Temporarily disable automatic termination of unhealthy instances to give you more time to investigate.
  3. Review the CodeDeploy deployment logs in detail.
  4. Check the system logs of the instance for any errors or warnings.
  5. Verify that all necessary ports are open in your security groups for both inbound and outbound traffic.
  6. Consider using EC2 Instance Connect or Systems Manager Session Manager to access the instance before it terminates to gather more information.

If the issue persists after checking these areas, you may need to engage AWS support for a more in-depth investigation of your specific configuration and deployment process.
Sources
Troubleshoot Amazon EC2 Auto Scaling issues - AWS CodeDeploy
Codedeploy fails when autoscaling group is empty | AWS re:Post
Why is my Auto Scaling group not launching new instances when the load increases? | AWS re:Post

answered 13 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.