ECS fails to remove a task from the load balancer target group?

0

We recently migrated our service to ECS, and we’ve seen a pattern of errors like this a few times:

  1. Our CPU usage is low, so as part of normal autoscaling, ECS starts to reduce the number of tasks by 1
  2. ECS claims to have deregistered 1 target and be draining connections
  3. 90 seconds later (after the deregistration delay) ECS stops the task
  4. Immediately after the task is stopped, a flood of load balancer 502s happens, all directed at just one IP. We suspect that this is the IP of the task that was removed and stopped, but somehow not removed from the ELB target group

We don’t have any long-lived connections, so the 90 second deregistration delay should be long enough for the task to finish processing its requests before it’s stopped.

It seems that the task selected for removal isn’t actually removed from the ALB target group, even though the ECS logs include messages indicating that it is. The logs include

* service \[our-service\] deregistered 1 targets in target-group \[our-target-group\]
followed by
* service \[our-service\] has begun draining connections on 1 tasks.

Events like this happen rarely (definitely not every time we scale down) but frequently enough to notice. Does anybody have ideas about why these errors might be happening, or how to get more information about what is going on? Thanks in advance.

1 réponse
0

It's due to the load balancer stickiness , it should be set up by adding the flag for it.

répondu il y a un an

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions