NLB stuck for about a minute after blue/green deployment succeeds (ECS, CodeDeploy)

0

Hi! I have containers spinning through ECS and traffic to them is routed through internal network loadbalancer, which is located in two zones (multi-AZ). I have set up code deploy blue/green deployment and I have a delay of about a minute when traffic is redirected to new deployed containers. That is, I just can't reach them through load balancer for about a minute, although the new containers are accepting traffic (When I request them directly - everything works).

I set a delay of about 15 minutes so that the old containers do not die, but when the load balancer "picks up" the new ones, this is where the stuck happens. Initially I thought that the problem was that the load balancer does not have time to resolve and redirects traffic to the “killed” containers, but it turned out that this is not the case.

Is it some misconfiguration from my side or limitation or NLB itself? And can I somehow get around this? Thanks!

p.s. If I use ALB, everything works without delays but for my requirements I can use only NLB

  • Could you please check on your Target Group configuration if you have enabled the "Terminate connections on deregistration"? Please check more information here

2 Answers
0
Accepted Answer

The longer registration time for ECS targets in Network Load Balancer architecture is a known behavior

When you register a new target to your Network Load Balancer, it is expected to take between 90 and 180 seconds to complete the registration process. After registration is complete, the Network Load Balancer health check systems will begin to send health checks to the target. A newly registered target must pass health checks for the configured interval to enter service and receive traffic. For example, if you configure your health check for a 30 second interval, and require 3 health checks to become healthy, the minimum time a newly registered target could enter service is 180 seconds (90 seconds for registration, and another 90 (3*30) seconds for passing health checks) after a new target passes its first health check.

Similarly, when you deregister a target from your Network Load Balancer, it is expected to take 90-180 seconds to process the requested deregistration, after which it will no longer receive new connections. During this time the Elastic Load Balancing API will report the target in 'draining' state. The target will continue to receive new connections until the deregistration processing has completed. At the end of the configured deregistration delay, the target will not be included in the describe-target-health response for the Target Group, and will return 'unused' with reason 'Target.NotRegistered' when querying for the specific target.

There is an ongoing feature request to reduce the time taken to register, deregister, Network Load Balancer targets but at the moment, there is no ETA for this implementation.

answered 10 months ago
  • Thanks for the explanation!

0

I have Terminate connections on deregistration = on and Deregistration delay = 60 sec, I tried changing these parameters and did not notice any changes.

Alexey
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions