The Amazon Elastic Container Service (Amazon ECS) deployment circuit breaker set my deployment state to FAILED. I want to troubleshoot what caused the deployment to fail.
Short description
When the number of consecutive failures in a deployment reaches the defined threshold, the deployment circuit breaker sets the deployment state to FAILED. You might receive the following error message:
"Resource handler returned message: "Error occurred during operation 'ECS Deployment Circuit Breaker was triggered'." (RequestToken: xxxxxxxx-xxxx-xxxxxx-xxxxxxx, HandlerErrorCode: GeneralServiceException)"
The following issues can cause your deployment to fail:
- A container failed the health check.
- A target group failed the Application Load Balancer health checks.
- The Amazon Elastic Container Registry (Amazon ECR) image doesn't exist.
- Your container instances didn't meet all the requirements.
- A task stopped or failed to start.
Resolution
To troubleshoot this issue, check the Amazon ECS service event messages to identify why Amazon ECS activated the circuit breaker. Then, take the following troubleshooting actions based on the reason.
A container failed the health check
If the Amazon ECS containers in your task can't pass the health checks, then you receive the following error message:
"(service AWS-Service) (task ff3e71a4-d7e5-428b-9232-2345657889) failed container health checks."
To resolve this issue, take the following actions:
For more information, see How do I troubleshoot container health check failures for Amazon ECS tasks?
A target group failed the Application Load Balancer health checks
To resolve this issue, complete the following steps:
- Verify that you correctly configured your target group's health check settings.
- Make sure that your application correctly responds to the specified health check request. Also, make sure that no network or security group issues block the health check requests.
For more information, see How do I troubleshoot failed health checks for Application Load Balancers?
Note: Amazon ECS initiates a rollback only when health check failures are consecutive.
The Amazon ECR image doesn't exist
To resolve this issue, complete the following steps:
- Verify that the image URI in your task definition is correct and exists in your Amazon ECR repository or other container registry.
- Make sure that your Amazon ECS task execution IAM role has the correct permissions to pull images from Amazon ECR.
- Check for network connectivity issues between your Amazon ECS cluster and the container registry.
For more information, see How do I resolve the "Image does not exist" error when my tasks fail to start in my Amazon ECS cluster?
Your container instances didn't meet all the requirements
To resolve this issue, see How do I resolve the "no container instance met all of its requirements" error in Amazon ECS?
A task stopped or failed to start
To resolve this issue, complete the following steps:
- Use Amazon CloudWatch Logs Insights to review your logs and the DescribeTasks API to get the task's stoppedReason.
- Confirm that the cluster has active instances.
- Make sure that the task's CPU or memory doesn't exceed the container instance's CPU or memory.
For more information, see Why is my Amazon ECS task stopped? and Why do the tasks in my Amazon ECS cluster fail to start?
Related information
Announcing Amazon ECS deployment circuit breaker