- Newest
- Most votes
- Most comments
What's the cause listed on the launch failure? Does it say "Due to ABANDON"? If so, then the Deployment in CodeDeploy is failing, and CodeDeploy is telling ASG to fail the launch (by sending a complete-lifecycle-action call with ABANDON as the result). Check the Deployment to see which step its failing at. For more details, you'll need to get the deployment logs from inside the instance.
If you're not automatically pushing these out, then you'll need to save the instance before its terminated, or set its EBS volume to not be deleted on instance termination. The simplest option is to quickly send a complete-lifecycle-action call with CONTINUE, and then detach the instance from the ASG. The Deployment in CodeDeploy will still continue, but it'll be a standalone EC2 instance, so it won't be terminated when the deployment fails, and you can take look in the logs to see what happened
Based on the information provided, it appears you're experiencing an issue with your Auto Scaling group where new instances are being terminated shortly after launch during a scale-out event. This problem can occur for several reasons, and here are some potential causes and solutions to consider:
-
CodeDeploy Deployment Failure: It's possible that the CodeDeploy deployment is failing on the new instance, causing the Auto Scaling group to mark it as unhealthy and terminate it. Check the CodeDeploy logs on the instance (if possible before termination) or in CloudWatch Logs for any deployment errors.
-
Instance Health Checks: Ensure that your Auto Scaling group's health check grace period is long enough to allow the CodeDeploy deployment to complete. If the grace period is too short, the instance might be terminated before the deployment finishes.
-
Application Startup Issues: If your application fails to start properly after deployment, it could cause the instance to fail health checks. Review your application logs for any startup errors.
-
IAM Permissions: While you mentioned checking IAM roles for S3 access, ensure that the instance role has all necessary permissions for both CodeDeploy and your application to function correctly.
-
Network Configuration: Verify that the instance can communicate with all required services and endpoints. This includes CodeDeploy service endpoints, your application dependencies, and any other AWS services you're using.
-
Resource Constraints: Check if there are any resource constraints (e.g., CPU, memory) that might be causing the application or CodeDeploy agent to fail.
-
Lifecycle Hooks: If you're using Auto Scaling lifecycle hooks, ensure they are configured correctly and not causing unexpected behavior.
-
CodeDeploy Agent Issues: Although you confirmed the agent is installed and running, check if it's the latest version and properly configured.
To troubleshoot further:
- Increase logging verbosity for CodeDeploy and your application.
- Temporarily disable automatic termination of unhealthy instances to give you more time to investigate.
- Review the CodeDeploy deployment logs in detail.
- Check the system logs of the instance for any errors or warnings.
- Verify that all necessary ports are open in your security groups for both inbound and outbound traffic.
- Consider using EC2 Instance Connect or Systems Manager Session Manager to access the instance before it terminates to gather more information.
If the issue persists after checking these areas, you may need to engage AWS support for a more in-depth investigation of your specific configuration and deployment process.
Sources
Troubleshoot Amazon EC2 Auto Scaling issues - AWS CodeDeploy
Codedeploy fails when autoscaling group is empty | AWS re:Post
Why is my Auto Scaling group not launching new instances when the load increases? | AWS re:Post
Relevant content
- asked 2 years ago
- asked a year ago
- AWS OFFICIALUpdated a year ago
Hello, there are a few different things that are happening, when ASG is sending scaling group action to code deploy.
What are the full "Cause" and "Description" fields? (redacting out any sensitive info)
These should give more info on why the launch failed
During the download bundle failure: CodeDeploy agent was not able to receive the lifecycle event. Check the CodeDeploy agent logs on your host and make sure the agent is running and can connect to the CodeDeploy server.
and during the afterInstall failure: [stderr] [stderr]Session terminated, killing shell... ...killed.
there were some more errors: during AllowTraffic: 1.The EC2 instance did not turn into the expected healthy state in target group. In "ASG_GROUP": {State: unused,Reason: Target.NotRegistered,Description: Target is not registered to the target group} timed out when trying to achieve ELB registration goal. 2. The following targets are not in a running state and cannot be registered: 'INSTANCE_ID'
This sounds like the main issue. If the Agent didn't connect correctly, it wouldn't be able to send a heartbeat to the ASG to extend the hook timeout. Make sure the most up to date version of the Agent is installed. Also make sure the instance has outbound connectivity.
If that doesn't fix it, you might want to open a support case so someone can take a look at your actual resources to troubleshoot in more detail