How do I troubleshoot issues related to blue/green deployments in Amazon ECS?

6 minute read
0

I want to troubleshoot issues related to blue/green deployments for services hosted on Amazon Elastic Container Service (Amazon ECS).

Short description

The most common issues related to blue/green deployments for services hosted on Amazon ECS are for the following services:

  • AWS Identity and Access Management (IAM)
  • Load balancer or Amazon ECS
  • AWS CloudFormation

Resolution

To troubleshoot these issues, complete the following tasks based on your use case.

IAM related issues

When you use IAM services for blue/green deployments in Amazon ECS, you might see the following issues:

You can't create your ECS service because you get the following error message: "Please create your Service role for CodeDeploy"

If you get this message, then AWS CodeDeploy doesn't have the required IAM permissions to use the blue/green deployment strategy. Grant the CodeDeploy IAM role permissions to update your Amazon ECS service on your behalf. Or, to create an IAM role for CodeDeploy, complete the following steps:

  1. Open the IAM console.
  2. In the navigation pane, choose Roles.
  3. Choose Create role
  4. In the Select type of trusted entity section, choose AWS service, and then choose CodeDeploy.
  5. In the Select your use case section, choose CodeDeploy - ECS, and then choose Next:Permissions.
    Note: Keep the default AWSCodeDeployforECS policy. This policy includes the permissions that CodeDeploy requires for interacting correctly with Amazon ECS and other services.
  6. Choose Next: Tags.
  7. (Optional) Enter a tag name, and then choose Next: Review.
  8. For Role name, enter ecsCodeDeployRole.
  9. Choose Create role.

You get the following error: "service failed to launch a task with"

You might get the following error message:

"service failed to launch a task with (error ECS was unable to assume the role that was provided for this task: Verify that the IAM role being passed has the proper trust relationship and permissions and that your IAM user has permissions to pass this role)"

If you get the error message, then check the IAM role returned by the error message. The Amazon Elastic Compute Cloud (Amazon EC2) instance must have a trusted relationship for the ECS tasks service ecs-tasks.amazonaws.com. The trust relationship for your role must look similar to the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "ec2.amazonaws.com",
          "ecs-tasks.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Load balancer or Amazon ECS

When you use blue/green deployments with a load balancer in Amazon ECS, you might see the following issues:

Your ECS service is failing to stabilize due to health check failures

If your ECS service fails to stabilize because of health check failures, then review your port mappings. The port mappings of your task definitions must match with the ports of your target groups. For more information, see How can I get my Amazon ECS tasks that use the Amazon EC2 launch type to pass the Application Load Balancer health check?

You get the following error: "Primary taskset target group must be behind listener"

You might get the following error message:

"The ELB could not be updated due to the following error: Primary taskset target group must be behind listener"

You get this error when your Elastic Load Balancing listeners or target groups are misconfigured. The ELB primary listener and test listener must both point to the primary target group that's currently serving your workloads.

Traffic is still routed to the blue target group after successful deployment

CodeDeploy automatically updates the primary Listener of your load balancer to point to the green target group after the deployment is complete. However, CodeDeploy updates only the production listener that you specify. If CodeDeploy fails to switch traffic after the deployment, then your ELB listeners might be configured with the wrong traffic type. For the primary ELB listener, specify the correct protocol and port for the primary ELB listener.

Your ECS tasks fail Application Load Balancer health checks only during a new green deployment

If ECS tasks that run in the ECS Service fail Application Load Balancer health checks during a new green deployment, then check your load balancer configuration. If another ECS service is trying to register its tasks to the same green target group, then it can cause a discrepancy. Update the load balancer configuration to make sure that only one ECS service or port is registered to one target group.

Your ECS tasks inconsistently fail Application Load Balancer health checks

This issue might happen when your containers take more time to start than what is expected. To resolve this issue, check your container application code for the cause of the delay. Then, optimize your container application code. If you still can't resolve the issue, then include a health check grace period on your ECS Service so that the containers have enough time to start.

Your ECS Service can't place a task because no container instance meets all of its requirements

If your ECS Service can't place a task because no container instance meets all of its requirements, then ECS chooses the closest matching container. If the closest matching container instance has insufficient CPU units available, then the deployment fails. To resolve this issue, review your container instance resources before you perform a blue/green deployment and add more resources if necessary.

CloudFormation

Note: These troubleshooting steps are applicable only if you use CloudFormation for your blue/green deployment.

When you use blue/green deployments with CloudFormation in Amazon ECS, you might see the following issues:

Your CloudFormation stack fails with an internal failure error

If you create a change set that triggers a blue/green deployment, your CloudFormation stack might fail with an internal failure error. To resolve this issue, use a CloudFormation service role and attach this role to your CloudFormation stack. The service role must have the necessary permissions to run all stack operations.

Note: After the stack is created, you can't remove the service role from the stack.

You get the following error: "CodeDeploy can't perform BlueGreen style update properly"

If you create a change set to trigger the blue/green deployment, then you might get the following error:

"'CodeDeployBlueGreenHook' of type AWS::CodeDeploy::BlueGreen failed with message: The TaskDefinition logical Id [] is the same between initial and final template, CodeDeploy can't perform BlueGreen style update properly"

When you specify a test listener that already points to the green target group, then the CodeDeploy hook fails with this error. To resolve this issue, update your test listener to not point to the green target group before running the blue/green deployment.

Important: Don't use the UpdateService API to cancel and roll back the blue/green deployment. Instead, use the CreateDeployment API. To roll back a deployment, use the deploy StopDeployment API.

Related information

Validate the state of an Amazon ECS service before deployment

How do I perform blue/green deployments for services hosted on Amazon ECS?

ecs-blue-green-deployment on the GitHub website

Perform ECS blue/green deployments through CodeDeploy using AWS CloudFormation

AWS OFFICIAL
AWS OFFICIALUpdated a month ago