- Newest
- Most votes
- Most comments
The situation you described is unusual and not the expected behavior for Amazon ECS deployments. Typically, when tasks are marked with a desired status of "Stopped" during a deployment, they should receive a SIGTERM signal and then be stopped if they don't exit gracefully within the specified stop timeout period.
There are a few potential reasons why the rev1 tasks were not being stopped as expected:
-
Task Protection: If the tasks had enabled task protection, they might resist being stopped during scale-in or deployment activities. However, this should not indefinitely prevent task termination.
-
Stop Timeout Configuration: If the tasks had a very long stop timeout configured, it could delay the forceful termination of the tasks. However, this timeout is typically limited to a maximum of 2 hours.
-
Resource Constraints: If the cluster didn't have enough resources to start the new rev2 tasks, it might have delayed stopping the old rev1 tasks. However, this should not prevent the stopping process entirely.
-
Networking or API Issues: There could have been temporary networking issues or API throttling that prevented the stop commands from being properly executed.
-
ECS Agent Issues: If the ECS agent on the EC2 instances was experiencing problems, it might not have properly processed the stop commands for the tasks.
The deployment itself should not prevent tasks from being stopped. In fact, during a deployment, ECS actively tries to stop old tasks to make room for new ones, especially when scaling down.
To prevent this issue in the future, you could consider the following:
- Ensure that your tasks are not using task protection unless absolutely necessary.
- Configure appropriate stop timeouts for your tasks.
- Monitor your ECS agent health on EC2 instances.
- Use managed instance draining with your Auto Scaling Group to help handle task replacement more gracefully.
- If the issue persists, you may want to contact AWS support for a more in-depth investigation of your specific setup.
Remember that manually killing EC2 instances or containers should be a last resort. In normal operations, ECS should manage this process automatically during deployments.
Sources
Community | Easier EC2 instance maintenance with managed draining for Amazon ECS capacity providers
Amazon ECS stopped tasks error messages - Amazon Elastic Container Service
Stopping an Amazon ECS task - Amazon Elastic Container Service
Relevant content
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 6 months ago