Why do new EC2 instances initially show as unhealthy in ALB Target Group during both blue-green deployments and autoscaling?

0

Hi all,

I’m using a blue-green deployment strategy in AWS for my production application and facing an issue with initial health check failures in ALB Target Groups.

Setup: Two ALBs:

test-alb (used to validate new instances)

main-alb (serves production traffic)

Deployment strategy:

I launch new instances via an Auto Scaling Group.

Initially register them to a Target Group attached to test-alb to verify their health.

Once they become healthy, I attach the same Target Group to main-alb for serving traffic.

Health check path: /

Health check port: 80

Grace period in Auto Scaling Group: 300 seconds

Security Groups allow inbound on port 80.

The Problem: When I first launch new instances, they appear as unhealthy in the ALB Target Group (test-alb) for some time before eventually becoming healthy. The same thing happens during autoscaling: newly launched instances show up as unhealthy for a short period Once the application fully starts, the instances are marked healthy and start serving traffic.

Constraints: I do not want to increase the health check timeout or interval. I want to avoid receiving traffic on new instances until they are marked healthy.

My Questions: Why do newly launched EC2 instances show as unhealthy initially even though they become healthy later? Is there a way to delay the registration of EC2 instances to the Target Group until the application is ready to serve traffic? What’s the best practice to avoid these health check failures during both blue-green deployments and autoscaling — without relaxing the health check settings? Any suggestions to improve this setup or mitigate the initial "unhealthy" state would be really appreciated!

2 Answers
0

Hi there. Can you provide a measurement (in seconds) of how long it takes instances to appear as healthy? How does this compare to the health check interval? I understand you don't want to modify the health check.

profile pictureAWS
EXPERT
answered 14 days ago
0

Everything you're describing sounds generally like expected behavior, and should usually be fine

ALB:

  • ALB will start doing healthchecks as soon as the instances are registered
  • If the application isn't running, the instance will fail the ALB HealthChecks, and once the UnhealthyThresholdCount has been reached, the ALB will mark the instance as Unhealthy
  • ALB will usually not send traffic to the Unhealthy (or initializing) instances. If all instances are unhealthy at the same time, then ALB will fail open and sends traffic to them; but as long as you have at least 1 healthy instance, no traffic will go to these new unhealthy ones

ASG

  • In your AutoScaling Group (ASG), the instance will be 'healthy' by default when its launched
  • ASG will ignore failing ELB healthchecks until the HealthCheckGracePeriod is done (5 minutes in your case)
  • If the instance is still unhealthy at the end of the Grace Period, the next time ASG does a healthcheck, it will likely mark the instance unhealthy and replace it (sounds like your application isn't unhealthy this long though?)

Improvements

  • You can delay the ASG registering the instance to the ALB by adding a Lifecycle Hook(LCH) to the ASG. Instances are only registered to the ELB after the LCH ends.
  • As a best practice, you should set the default action as ABANDON, and then have the bootstrapping send a complete-lifecycle-action API call with the CONTINUE result as soon as your scripts have verified the application is up and running correctly. This way, only confirmed working instances are registered to the ALB
AWS
EXPERT
answered 14 days ago
profile pictureAWS
EXPERT
reviewed 13 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions