ECS fails to remove a task from the load balancer target group?

0

We recently migrated our service to ECS, and we’ve seen a pattern of errors like this a few times:

  1. Our CPU usage is low, so as part of normal autoscaling, ECS starts to reduce the number of tasks by 1
  2. ECS claims to have deregistered 1 target and be draining connections
  3. 90 seconds later (after the deregistration delay) ECS stops the task
  4. Immediately after the task is stopped, a flood of load balancer 502s happens, all directed at just one IP. We suspect that this is the IP of the task that was removed and stopped, but somehow not removed from the ELB target group

We don’t have any long-lived connections, so the 90 second deregistration delay should be long enough for the task to finish processing its requests before it’s stopped.

It seems that the task selected for removal isn’t actually removed from the ALB target group, even though the ECS logs include messages indicating that it is. The logs include

* service \[our-service\] deregistered 1 targets in target-group \[our-target-group\]
followed by
* service \[our-service\] has begun draining connections on 1 tasks.

Events like this happen rarely (definitely not every time we scale down) but frequently enough to notice. Does anybody have ideas about why these errors might be happening, or how to get more information about what is going on? Thanks in advance.

evan
已提問 2 年前檢視次數 963 次
1 個回答
0

It's due to the load balancer stickiness , it should be set up by adding the flag for it.

已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南