AWS ECS Task with status "Task is stopping" not being stopped

0

Earlier today, I was deploying a new version of an ECS service with an EC2 ASG capacity provider. This service has 3 tasks running at a minimum. Earlier, this service had scaled up to, lets say, 15 tasks. Then, the deployment started (rev1->rev2), and the service scaled back down to 3 tasks.

I have ECS deployments configured with a minimum healthy percentage of 50%, and a maximum of 600%

In this situation, all but 1 of the rev1 task had a desired status of "Stopped" (as expected), and the task overview read "Task is stopping. Scaling activity initiated by deploy ecs-svc/----", however, these tasks never received a SIGTERM/SIGKILL signal, so they never actually stopped. A task with rev2 was pending, but never started. Furthermore, in the list of services for the cluster, the "desired status stopped" instances did not count towards the "running" amount of tasks, so imagine my suprise when I read "1 running, 2 pending" and then see 15 rev1 tasks running, 14 of them waiting infinitely to stop.

I had to fix this manually by killing some of the EC2 instances with idle rev1 tasks or killing the containers directly (which is what I expected the ECS deployment to do), which gave the rev2 task space to exist, and then the deployment continued as expected.

My question is, why were the rev1 tasks with "stopped" as desired status not being stopped? Was the deployment preventing the tasks from being stopped somehow?

asked 2 months ago135 views
1 Answer
0

The situation you described is unusual and not the expected behavior for Amazon ECS deployments. Typically, when tasks are marked with a desired status of "Stopped" during a deployment, they should receive a SIGTERM signal and then be stopped if they don't exit gracefully within the specified stop timeout period.

There are a few potential reasons why the rev1 tasks were not being stopped as expected:

  1. Task Protection: If the tasks had enabled task protection, they might resist being stopped during scale-in or deployment activities. However, this should not indefinitely prevent task termination.

  2. Stop Timeout Configuration: If the tasks had a very long stop timeout configured, it could delay the forceful termination of the tasks. However, this timeout is typically limited to a maximum of 2 hours.

  3. Resource Constraints: If the cluster didn't have enough resources to start the new rev2 tasks, it might have delayed stopping the old rev1 tasks. However, this should not prevent the stopping process entirely.

  4. Networking or API Issues: There could have been temporary networking issues or API throttling that prevented the stop commands from being properly executed.

  5. ECS Agent Issues: If the ECS agent on the EC2 instances was experiencing problems, it might not have properly processed the stop commands for the tasks.

The deployment itself should not prevent tasks from being stopped. In fact, during a deployment, ECS actively tries to stop old tasks to make room for new ones, especially when scaling down.

To prevent this issue in the future, you could consider the following:

  1. Ensure that your tasks are not using task protection unless absolutely necessary.
  2. Configure appropriate stop timeouts for your tasks.
  3. Monitor your ECS agent health on EC2 instances.
  4. Use managed instance draining with your Auto Scaling Group to help handle task replacement more gracefully.
  5. If the issue persists, you may want to contact AWS support for a more in-depth investigation of your specific setup.

Remember that manually killing EC2 instances or containers should be a last resort. In normal operations, ECS should manage this process automatically during deployments.
Sources
Community | Easier EC2 instance maintenance with managed draining for Amazon ECS capacity providers
Amazon ECS stopped tasks error messages - Amazon Elastic Container Service
Stopping an Amazon ECS task - Amazon Elastic Container Service

profile picture
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions