- Newest
- Most votes
- Most comments
Hi,
Please try this solution it will be helpful for you.
To resolve the issue of an unexpected ECS task stop:
Step 1: Check ECS Service Events
Go to the ECS Console:
Navigate to your cluster and then to the specific service.
Check Events Tab:
Look for any events around the time the task stopped for any specific error messages or reasons.
Step 2: Review ECS Agent Logs
SSH into the EC2 instances:
Locate ECS agent logs in /var/log/ecs/ecs-agent.log*.
Check Logs:
Look for log entries around the time of the incident for any errors or warnings.
Step 3: Monitor Resource Utilization
CloudWatch Metrics:
Open CloudWatch and check CPU and memory usage for the tasks and EC2 instances around the incident time to ensure there were no resource shortages.
Step 4: Investigate Network Issues
Review EC2 Network Logs:
Check /var/log/messages or /var/log/syslog for network-related entries, especially DHCP solicitations or network interface state changes.
Step 5: Review Task Definition and ECS Configuration
Task Definition:
Ensure health checks and task timeouts are correctly configured.
Service Configuration:
Verify there are no misconfigurations in the deployment or health check settings.
Step 6: Use CloudWatch Logs Insights
Go to CloudWatch Logs:
Navigate to your log group for the ECS service.
Run Insights Query:
Use the following query to filter logs around the incident time.
fields @timestamp, @message
| filter @timestamp >= '2024-06-20T20:00:00Z' and @timestamp <= '2024-06-20T20:10:00Z'
| sort @timestamp desc
if you want more information please go through the AWS Document link.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/stopped-task-errors.html
https://repost.aws/knowledge-center/ecs-tasks-container-exit-issues
Known factors:
A service task (ID: 9f429fd7a19c88ae18f4ce2546d48bb) stopped on June 20th at 20:06:42 UTC. No deployments, scaling operations, reboots, or Docker restarts occurred around that time. CloudTrail logs show no relevant activity. Application logs appear normal for the newly running task.
ECS event logs:
The ECS event logs show the task transitioning to a STOPPED state initiated by ACS (Amazon ECS Agent). EC2 instance logs (messages-*)
The logs indicate DHCP client (dhclient) activity around the time of the task stop, suggesting a potential network renewal attempt. Docker logs show the container failing to exit gracefully within 30 seconds of receiving a signal 15 (termination signal).
Key takeaways:
The lack of usual triggers for a task stop suggests an external factor might be at play. The network renewal attempt by the DHCP client and the docker container failing to exit gracefully raise suspicion of a network connectivity issue around the time of the stop.
Possible causes:
Network connectivity issue: A temporary network disruption on the EC2 instance could have caused the container to become unresponsive, leading ECS to stop the task. This aligns with the observed DHCP renewal attempt and container termination behavior. Resource exhaustion: Though less likely without evidence, resource constraints like memory or CPU limitations could have triggered the container to crash and subsequently stop the task.
Recommendations:
Investigate network logs on the EC2 instance for any anomalies around June 20th, 20:06 UTC. This could involve checking for errors or dropped packets. Consider enabling ECS service logs to capture detailed information about task failures in the future. These logs can be found in CloudWatch. If network connectivity issues are suspected, review network configurations and ensure proper communication between the container and external services. If resource exhaustion is a concern, monitor resource utilization on the EC2 instances and consider scaling if necessary.
Relevant content
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated a year ago

Hi,
thanks!