CodeDeploy actions wrong order during scale in

0

The application is deployed through CodeDeploy to multiple EC2 instances and serves users through WebSocket connections along with the simple HTTP(s) requests. During the deployment CodeDeploy is deregistering each instance from ELB one by one, does the deployment and re-registers it back to service. During the ELB deregistration new connections do not arrive to the instance, but websocket connections are still intact until the deregistration timeout breaks them. To provide seamless experience to the users during deployment we are using BeforeBlockTraffic CodeDeploy hook to ask websocket clients to reconnect, so ELB serves the new websocket connection to another instance.

So the order of operations for CodeDeploy deployment operation is the following:

  • BeforeBlockTraffic notifies instance about the upcoming connections termination
  • BlockTraffic runs for ELB deregistration timeout (300 seconds in our case), then all existing connections are severed
  • all other deployment steps are running.

Unfortunately our approach does not work correctly when EC2 Autoscaling Group is terminating the instance because of scale-in operation. As Autoscaling Group is connected to CodeDeploy through the lifecycle hooks, the CodeDeploy actions are running after the deregistration is complete, as stated in https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-lifecycle.html#as-lifecycle-scale-in. In this case order of operations is the following:

  • EC2 AG Lifecycle: Terminating:Wait state
  • ELB: Draining connections
  • ELB: Severing existing connections
  • EC2 AG Lifecycle: Runs terminating hooks
  • CodeDeploy: BeforeBlockTraffic notifies instance when all connections are already terminated, not what we expected
  • CodeDeploy: BlockTraffic waits for health check timeout (60 seconds for some reason), not what we expected too.

...

So the question is: how can I make the notification about upcoming ELB deregistration to the instance that will work correctly for both deployment and scale-in actions?

2 Answers
1
Accepted Answer

Do you need the target group (TG) associated with the ASG (for example, are you using ELB healthchecks on the ASG?)

If not, you might be able to just not mention the TG on the ASG so that ASG doesn't deregister the instances; and instead only specify the TG on the CodeDeploy Deployment Group. I haven't tested the new CodeDeploy termination hook on ASG yet, but if that workflow doesn't deregister the instance, you could have that as a step in your CodeDeploy scripts. The only major concern I can think of with this option is if you're trying to deregister many instances at once, you might face ELB API throttling on the Deregister calls.

AWS
answered 7 days ago
  • Thank you! I have removed link between ASG and ELB and now deregistration is part of CodeDeploy workflow

1

you can adjust your approach to handle scale-in events more effectively. ** 1.Detect Scale-In Events:** Implement a mechanism to detect when an instance is scheduled for termination due to a scale-in event triggered by the EC2 Auto Scaling group.

2.Pre-Drain Notification: Before the ELB starts draining connections from the instance, send a notification to the instance indicating that it will be deregistered from the ELB soon. This notification should give the instance enough time to gracefully handle existing WebSocket connections and inform clients to reconnect to other instances.

3.Handle Graceful Connection Closure: Upon receiving the pre-drain notification, your application running on the instance should start gracefully closing WebSocket connections. This involves informing connected clients to reconnect to other instances.

4.Update CodeDeploy Hooks: Adjust your BeforeBlockTraffic and BlockTraffic CodeDeploy hooks to account for the pre-drain notification. These hooks should wait for a shorter duration since the instance has already started gracefully closing connections.

**

5.Testing and Validation:** Thoroughly test your solution to ensure that WebSocket connections are gracefully handled during both deployment and scale-in events. Validate that clients can seamlessly reconnect to other instances without service interruption.

answered 10 days ago
  • I appreciate your effort in answering the question, but without technical details the answer does not look complete.

    1. How can I detect an automatic scale-in event? Is there any mechanism to provide a hook for notifications that will be fired when the instance is chosen, not after as Autoscaling Group Lifecycle is finished draining the connections?
    2. How one can set up ELB to notify the instance about the upcoming connection draining? I don't see any related functionality in ELB documentation.
    3. That is what we already doing.
    4. That is what we already doing and it does not work for Autoscaling Group events as connection draining happens before the BeforeBlockTraffic, as I stated in the original question.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions