I want to troubleshoot my Amazon Elastic Container Service (Amazon ECS) task that stopped.
Resolution
Use the DescribeTasks API to view the details of a stopped task. To troubleshoot your task, check the stopped reason and exit code. The details for the stopped task appear only for 1 hour in the API results. To allow more time to view stopped task details, use the amazon-ecs-stopped-tasks-cwlogs template on the GitHub website.
Reasons for a stopped task
Essential container in task exited
If your essential container exited, then use one of the following exit codes to troubleshoot your issue:
- The 0 exit code occurs when the entry point, success, or CMD is complete and the container is stopped.
- The 1 exit code occurs when there's an application error. To troubleshoot this exit code, review your application logs.
- The 137 exit code occurs when you don't respond to a SIGTERM within the default 30-second period and Amazon ECS forces the task to exit for the container (SIGKILL). To change the default 30-second period, update on the ECS container agent with the ECS_CONTAINER_STOP_TIMEOUT parameter.
Note: This exit code can also occur because of an Out-of-Memory (OOM) error. To check your resource usage, review your Amazon CloudWatch metrics for Amazon ECS.
- The 139 exit code occurs when the application tried to access a memory AWS Region that isn't available. This exit code also occurs when an unset or environment placeholder that's not valid causes a segmentation fault. To troubleshoot this issue, review the Amazon ECS CloudWatch logs for your Lambda function.
- The 143 exit code occurs when the container received a graceful shutdown warning, and Amazon ECS shut down the container.
- The 255 exit code occurs when the ENTRYPOINT CMD command in your container failed because of an error. To confirm that your container failed because of an error, review your CloudWatch logs.
For more troubleshooting steps, see How do I troubleshoot Amazon ECS tasks that stop or fail to start when my container exits?
CannotPullContainerError
This error occurs when the task failed to start because Amazon ECS can't retrieve the specified container image.
To resolve this issue for an Amazon Elastic Compute Cloud (Amazon EC2) launch type task, see How do I resolve "CannotPullContainerError" errors when I launch an EC2 task in Amazon ECS?
To resolve this issue for an Amazon ECS task that uses the Fargate launch type, see How do I resolve the "cannotpullcontainererror" error for my Amazon ECS tasks on Fargate?
Task failed Elastic Load Balancer health checks
To resolve this issue for tasks that use the EC2 launch type, see How do I get my Amazon ECS tasks that use the Amazon EC2 launch type to pass the Application Load Balancer health check?
To resolve this issue for tasks that use the Fargate launch type, see How do I troubleshoot health check failures for Amazon ECS tasks on Fargate?
Failed container health checks
You define health checks in the HealthCheck API or the Dockerfile. For more information, see HEALTHCHECK on the Docker website.
To troubleshoot container health check errors, see How do I troubleshoot container health check failures for Amazon ECS tasks?
(instance i-##) (port #) is unhealthy in (reason Health checks failed)
This error occurs when an unhealthy Amazon EC2 instance doesn't respond to health checks on the specified port.
To troubleshoot this issue, take the following actions:
For more information about this error message, see Access logs for your Network Load Balancer.
Service ABCService: ECS is performing maintenance on the underlying infrastructure hosting the task
This error occurs when Amazon ECS performs maintenance on the AWS Fargate servers that run your application containers. As a result, your Amazon ECS service is temporarily unavailable.
For more information, see Task retirement and maintenance for AWS Fargate on Amazon ECS.
For standalone tasks, see How do I take action on an Amazon ECS task retirement notice for a task that runs on Fargate?
Amazon ECS service scaling event activated
During an Amazon ECS service scaling event, tasks stop because the scaling policy reduces the number of tasks that run in the service. Then, Amazon ECS stops tasks to reach the new specified number. This action typically occurs when demand decreases and Amazon ECS requires fewer tasks to handle the workload.
To resolve this issue, take the following actions:
- Create CloudWatch alarms for changes in your service or tasks.
- Review scheduled deployments that might affect your tasks.
To protect your tasks from scale-in event termination because of service auto scaling or deployments, use Amazon ECS task scale-in protection.
For more information, see How do I view and manage scheduled scaling actions for Amazon ECS services?
Task stopped by user
The task received a StopTask API. To identify the user who initiated the call, view StopTask in AWS CloudTrail for userIdentity information.
ResourceInitializationError: errors
For different ResourceInitialization error messages and resolution steps, see Troubleshooting Amazon ECS ResourceInitializatioError errors.
To troubleshoot the "ResourceInitializationError: unable to pull secrets or registry auth" error, see How do I resolve a "ResourceInitializationError" when I try to pull secrets or retrieve Amazon ECR authentication for ECS tasks?
To troubleshoot the "ResourceInitializationError: failed to validate logger args" error, see How do I resolve the "ResourceInitializationError: failed to validate logger args" error in Amazon ECS?
SpotInterruptionError
For more information about SpotInterruptionError, see Troubleshooting Amazon ECS SpotInterruption errors.
To troubleshoot this error, see How do I handle Spot termination notices in AWS Fargate Spot tasks?
OutOfMemoryError
This error occurs when a container exits because processes in the container use more memory than you allocated in the task definition.
To troubleshoot this error, see How do I troubleshoot OutOfMemory errors in Amazon ECS?
Error messages
If you receive an error message when your task stops, then take the following troubleshooting actions based on the error.
No Container Instances were found in your cluster
To resolve this issue, launch a container instance.
To review the container instances for your cluster, complete the following steps:
- Open the Amazon ECS console.
- In the navigation pane, choose Clusters.
- Select your cluster.
- Choose the Infrastructure tab.
- Review the Container instances section.
If there are no container instances, then see Why can't my Amazon EC2 instance join the Amazon ECS cluster?
InvalidParameterException
To resolve this error message, check that the parameters in your TaskDefinition exist and have the correct ARNs. Verify that the task role and task execution role have the required permissions.
You've reached the limit of the number of tasks that you can run concurrently
This error occurs when you exceed an Amazon ECS service quota. To troubleshoot this issue, see How do I resolve Amazon ECS service quota issues?
Related information
Resolve Amazon ECS stopped task errors
Viewing Amazon ECS stopped task errors