Why has my Amazon ECS task stopped?

7 minute read
3

I want to troubleshoot my Amazon Elastic Container Service (Amazon ECS) task that stopped.

Resolution

Use the DescribeTasks API to view the details of a stopped task. To troubleshoot your task, check the stopped reason and exit code. The details for the stopped task appear only for 1 hour in the API results. To allow more time to view stopped task details, use the amazon-ecs-stopped-tasks-cwlogs template on the GitHub website.

Reasons for a stopped task

Essential container in task exited

If your essential container exited, then use one of the following exit codes to troubleshoot your issue:

  • The 0 exit code occurs when the entry point, success, or CMD is complete and the container is stopped.
  • The 1 exit code occurs when there's an application error. To troubleshoot this exit code, review your application logs.
  • The 137 exit code occurs when you don't respond to a SIGTERM within the default 30-second period and Amazon ECS forces the task to exit for the container (SIGKILL). To change the default 30-second period, update on the ECS container agent with the ECS_CONTAINER_STOP_TIMEOUT parameter.
    Note: This exit code can also occur because of an Out-of-Memory (OOM) error. To check your resource usage, review your Amazon CloudWatch metrics for Amazon ECS.
  • The 139 exit code occurs when the application tried to access a memory AWS Region that isn't available. This exit code also occurs when an unset or environment placeholder that's not valid causes a segmentation fault. To troubleshoot this issue, review the Amazon ECS CloudWatch logs for your Lambda function.
  • The 143 exit code occurs when the container received a graceful shutdown warning, and Amazon ECS shut down the container.
  • The 255 exit code occurs when the ENTRYPOINT CMD command in your container failed because of an error. To confirm that your container failed because of an error, review your CloudWatch logs.

For more troubleshooting steps, see How do I troubleshoot Amazon ECS tasks that stop or fail to start when my container exits?

CannotPullContainerError

This error occurs when the task failed to start because Amazon ECS can't retrieve the specified container image.

To resolve this issue for an Amazon Elastic Compute Cloud (Amazon EC2) launch type task, see How do I resolve "CannotPullContainerError" errors when I launch an EC2 task in Amazon ECS?

To resolve this issue for an Amazon ECS task that uses the Fargate launch type, see How do I resolve the "cannotpullcontainererror" error for my Amazon ECS tasks on Fargate?

Task failed Elastic Load Balancer health checks

To resolve this issue for tasks that use the EC2 launch type, see How do I get my Amazon ECS tasks that use the Amazon EC2 launch type to pass the Application Load Balancer health check?

To resolve this issue for tasks that use the Fargate launch type, see How do I troubleshoot health check failures for Amazon ECS tasks on Fargate?

Failed container health checks

You define health checks in the HealthCheck API or the Dockerfile. For more information, see HEALTHCHECK on the Docker website.

To troubleshoot container health check errors, see How do I troubleshoot container health check failures for Amazon ECS tasks?

(instance i-##) (port #) is unhealthy in (reason Health checks failed)

This error occurs when an unhealthy Amazon EC2 instance doesn't respond to health checks on the specified port.

To troubleshoot this issue, take the following actions:

  • Verify that the security group that's attached to the container instance allows the required traffic.
  • Run the following command to confirm that the backend responds without delay:
    curl -iv localhost:container-port/path
    Note: Replace container-port with your container port and path with the health check path.
  • Increase the response timeout value from the default 30 seconds. Update on the ECS container agent with the ECS_CONTAINER_STOP_TIMEOUT parameter.

For more information about this error message, see Access logs for your Network Load Balancer.

Service ABCService: ECS is performing maintenance on the underlying infrastructure hosting the task

This error occurs when Amazon ECS performs maintenance on the AWS Fargate servers that run your application containers. As a result, your Amazon ECS service is temporarily unavailable.

For more information, see Task retirement and maintenance for AWS Fargate on Amazon ECS.

For standalone tasks, see How do I take action on an Amazon ECS task retirement notice for a task that runs on Fargate?

Amazon ECS service scaling event activated

During an Amazon ECS service scaling event, tasks stop because the scaling policy reduces the number of tasks that run in the service. Then, Amazon ECS stops tasks to reach the new specified number. This action typically occurs when demand decreases and Amazon ECS requires fewer tasks to handle the workload.

To resolve this issue, take the following actions:

  • Create CloudWatch alarms for changes in your service or tasks.
  • Review scheduled deployments that might affect your tasks.

To protect your tasks from scale-in event termination because of service auto scaling or deployments, use Amazon ECS task scale-in protection.

For more information, see How do I view and manage scheduled scaling actions for Amazon ECS services?

Task stopped by user

The task received a StopTask API. To identify the user who initiated the call, view StopTask in AWS CloudTrail for userIdentity information.

ResourceInitializationError: errors

For different ResourceInitialization error messages and resolution steps, see Troubleshooting Amazon ECS ResourceInitializatioError errors.

To troubleshoot the "ResourceInitializationError: unable to pull secrets or registry auth" error, see How do I resolve a "ResourceInitializationError" when I try to pull secrets or retrieve Amazon ECR authentication for ECS tasks?

To troubleshoot the "ResourceInitializationError: failed to validate logger args" error, see How do I resolve the "ResourceInitializationError: failed to validate logger args" error in Amazon ECS?

SpotInterruptionError

For more information about SpotInterruptionError, see Troubleshooting Amazon ECS SpotInterruption errors.

To troubleshoot this error, see How do I handle Spot termination notices in AWS Fargate Spot tasks?

OutOfMemoryError

This error occurs when a container exits because processes in the container use more memory than you allocated in the task definition.

To troubleshoot this error, see How do I troubleshoot OutOfMemory errors in Amazon ECS?

Error messages

If you receive an error message when your task stops, then take the following troubleshooting actions based on the error. 

No Container Instances were found in your cluster

To resolve this issue, launch a container instance.

To review the container instances for your cluster, complete the following steps:

  1. Open the Amazon ECS console.
  2. In the navigation pane, choose Clusters.
  3. Select your cluster.
  4. Choose the Infrastructure tab.
  5. Review the Container instances section.

If there are no container instances, then see Why can't my Amazon EC2 instance join the Amazon ECS cluster?

InvalidParameterException

To resolve this error message, check that the parameters in your TaskDefinition exist and have the correct ARNs. Verify that the task role and task execution role have the required permissions.

You've reached the limit of the number of tasks that you can run concurrently

This error occurs when you exceed an Amazon ECS service quota. To troubleshoot this issue, see How do I resolve Amazon ECS service quota issues?

Related information

Resolve Amazon ECS stopped task errors

Viewing Amazon ECS stopped task errors

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago
4 Comments

I am getting errors from the ECS service with error code 139, while I check from inside the container using ECS execute-command the memory usage for the container is around 700MB, it is assigned 8GB for the ECS task definition and the soft limit is 1024 and hard limit is 8000

is there any way to troubleshoot this error in the AWS ECS service?

Also, one thing I noticed is even though I have added 8GB as container memory in the task definition why the total memory is showing as 16GB while I check the free -m using ECS execute-command

replied 2 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 2 years ago

I am getting an error with exit code 0. I am able to successfully call the task and get the response. After then, the task automatically stops itself and recreate a new one. I want to have the same task because the calls must be done with the same IP address.

replied 5 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 5 months ago