ECS fails to remove a task from the load balancer target group?

0

We recently migrated our service to ECS, and we’ve seen a pattern of errors like this a few times:

  1. Our CPU usage is low, so as part of normal autoscaling, ECS starts to reduce the number of tasks by 1
  2. ECS claims to have deregistered 1 target and be draining connections
  3. 90 seconds later (after the deregistration delay) ECS stops the task
  4. Immediately after the task is stopped, a flood of load balancer 502s happens, all directed at just one IP. We suspect that this is the IP of the task that was removed and stopped, but somehow not removed from the ELB target group

We don’t have any long-lived connections, so the 90 second deregistration delay should be long enough for the task to finish processing its requests before it’s stopped.

It seems that the task selected for removal isn’t actually removed from the ALB target group, even though the ECS logs include messages indicating that it is. The logs include

* service \[our-service\] deregistered 1 targets in target-group \[our-target-group\]
followed by
* service \[our-service\] has begun draining connections on 1 tasks.

Events like this happen rarely (definitely not every time we scale down) but frequently enough to notice. Does anybody have ideas about why these errors might be happening, or how to get more information about what is going on? Thanks in advance.

1回答
0

It's due to the load balancer stickiness , it should be set up by adding the flag for it.

回答済み 1年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ