ECS service deployment stuck in deadlock due to deleted target group

1

Hi, we recently came across a problem with ECS service deployments, which is in our view a lack of robustness. Our setup roughly looks like this: We have an ECS service, which is reachable via different domains, which may change (even if not often). Due to technical reasons, the requests for the different domains are routed to the service task containers via separate target groups. Changes to the service are done with a deployment configuration allowing a minium of 100% and a maximum of 200%. In the automation, when switching the domains, a target group associated with the service might be deleted before the deployment has deregistered the existing container targets. As a result, the deployment is stuck in a state, where it can't remove the old task anymore. This can be observed in CloudTrail:

{
    ...
    "eventSource": "elasticloadbalancing.amazonaws.com",
    "eventName": "DescribeTargetGroups",
    "awsRegion": "eu-central-1",
    "sourceIPAddress": "ecs.amazonaws.com",
    "userAgent": "ecs.amazonaws.com",
    "errorCode": "TargetGroupNotFoundException",
    "errorMessage": "One or more target groups not found",
    ...
}

We are aware that our solution should handle this situation better, i.e. the target groups should not be deleted too early and we are already looking into this. However we were a bit surprised, that the deployment was completely stuck in this case, blocking all subsequent deployments due to the min/max configuration. Could this be handled in a more robust way on AWS side? And any suggestions how to handle this in our automation? We would not like to have a "polling configuration" waiting for the service to be in steady state with each change as we would like to keep this async.

Thanks in advance

  • Update: It seems like ECS handles this error correctly, if only one target group is wired to the ECS service. It will print the error event in the events view, but does not get stuck. But if there are e.g. two target groups attached and one of these is deleted prematurely, the situation described above occurs.

답변 없음

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠