When I deploy a new service in my Amazon Elastic Container Service (Amazon ECS) cluster, tasks are incorrectly deployed or terminated.
Resolution
Check that you're using a container with enough resource capacity
If new deployments consistently fail load balancer health checks, then check the CpuUtilized, CpuReserved, and MemoryUtilized Amazon ECS Container Insights Metrics. If you use an Application Load Balancer, then also check TargetResponseTime.
Also, configure the HealthCheckGracePeriodSeconds property with a value that's longer than your application's startup time. Delayed health check responses can cause failures and prompt Amazon ECS to cycle tasks.
Check the container status and exit codes
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
For tasks that terminate unexpectedly, run the following describe-tasks command to check the container exit codes:
aws ecs describe-tasks --cluster ECS_CLUSTER --tasks TASK_ARN -- region REGION
Note: Replace ECS_CLUSTER with your cluster name, TASK_ARN with the task ARN, and REGION with your AWS Region.
If the output of the preceding command is 0, then the task launched successfully. If the output is 1, then there was an application error and you must check your application logs. If the output is 137, then the container received a SIGKILL signal, typically caused by an Out-of-Memory issue. To resolve this issue, see How do I troubleshoot OutOfMemory errors in Amazon ECS? and Why is my Amazon ECS task stopped?
Review task definition configuration
Make sure that all task definition configurations are valid, especially after you modify or create new task definitions. To verify your task definition configurations, run a standalone task.
Configure your load balancer settings
Configure deregistration_delay.timeout_seconds in the load balancer to meet your needs. For long-lived requests, use a higher value. For Amazon Elastic Compute Cloud (Amazon EC2) Spot instances, this value must be lower than 120 seconds.
To modify deregistration_delay.timeout_seconds, run the following modify-target-group-attributes command:
aws elbv2 modify-target-group-attributes --target-group-arn
EXAMPLE_ARN --attributes Key=deregistration_delay.timeout_seconds,Value=120
Note: Replace EXAMPLE_ARN with your load balancer ARN and 120 with the deregistration delay timeout in seconds.
Also, optimize the load balancer's health check settings. If the settings are too strict for your application, then the load balancer might frequently mark the target as unhealthy.
Configure your application's graceful shutdown time
By default, ECS_CONTAINER_STOP_TIMEOUT is set to 30 seconds. When you access your application during the draining phase of previous tasks in a new Amazon ECS deployment, you might receive a 5xx error. If you encounter issues when you access your application, then increase the value of ECS_CONTAINER_STOP_TIMEOUT based on your application's graceful shutdown needs. It's a best practice to test the updated value in a test environment before you deploy it.
To update the value, add code similar to the following example to your task definition:
{
"containerDefinitions": [
{
"name": "your-container",
"image": "your-image",
"stopTimeout": 120
}
]
}
Note: Replace 120 with your ECS_CONTAINER_STOP_TIMEOUT value.
To configure graceful shutdown, add a SIGTERM handler in your application. For an AWS Fargate Spot, set up the configuration to call DeregisterTargets during the SIGTERM handler. This ensures that the Amazon ECS deregisters FARGATE_SPOT tasks from the load balancer's target group.
Related information
Optimize load balancer connection draining parameters for Amazon ECS