ECS fails to remove a task from the load balancer target group?

0

We recently migrated our service to ECS, and we’ve seen a pattern of errors like this a few times:

  1. Our CPU usage is low, so as part of normal autoscaling, ECS starts to reduce the number of tasks by 1
  2. ECS claims to have deregistered 1 target and be draining connections
  3. 90 seconds later (after the deregistration delay) ECS stops the task
  4. Immediately after the task is stopped, a flood of load balancer 502s happens, all directed at just one IP. We suspect that this is the IP of the task that was removed and stopped, but somehow not removed from the ELB target group

We don’t have any long-lived connections, so the 90 second deregistration delay should be long enough for the task to finish processing its requests before it’s stopped.

It seems that the task selected for removal isn’t actually removed from the ALB target group, even though the ECS logs include messages indicating that it is. The logs include

* service \[our-service\] deregistered 1 targets in target-group \[our-target-group\]
followed by
* service \[our-service\] has begun draining connections on 1 tasks.

Events like this happen rarely (definitely not every time we scale down) but frequently enough to notice. Does anybody have ideas about why these errors might be happening, or how to get more information about what is going on? Thanks in advance.

evan
已提问 2 年前942 查看次数
1 回答
0

It's due to the load balancer stickiness , it should be set up by adding the flag for it.

已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则