I want to troubleshoot a stopped Windows task in an Amazon Elastic Container Service (Amazon ECS) cluster.
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
To troubleshoot a stopped task, use the AWS Management Console or AWS CLI to view stopped task errors. Or, use DescribeTasks to get information about the stopped task.
Important: You can only access the information that DescribeTasks pulls about stopped tasks within a 1 hour of when the task failed. To retain this data for longer, use the AWS CloudFormation template from amazon-ecs-stopped-tasks-cwlogs on the GitHub website. With this template, capture the Amazon CloudWatch logs that Amazon EventBridge generates when a task stops.
Complete the following troubleshooting steps for the error that you receive.
Common stopped task errors
To troubleshoot common stopped task errors, such as container instance health check issues, see Why is my Amazon ECS task stopped?.
If you receive a CannotPullContainerError: API error, then see How can I resolve the Amazon Elastic Container Registry (Amazon ECR) error "CannotPullContainerError: API error" in Amazon ECS?.
If you receive an OutOfMemory error, then see How do I troubleshoot OutOfMemory errors in Amazon ECS?.
"No valid providers in chain" error
If your instance doesn't have ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE set, then you receive an error that looks similar to the "CannotStartContainerError: Error response from daemon: failed to initialize logging driver: failed to create Cloudwatch log stream: NoCredentialProviders: no valid providers in chain".
To resolve this issue, make sure that you set ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE on the container instance. Example PowerShell syntax:
<powershell>
[Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE", $TRUE, "Machine")
Initialize-ECSAgent -Cluster cluster-name -EnableTaskIAMRole -LoggingDrivers '["json-file","awslogs"]'
</powershell>
Note: Replace cluster-name with your cluster name.
"The container operating system does not match the host operating system" error
If the host operating system (OS) doesn't match the Windows container instance's base image OS, then you receive an error that looks similar to "CannotStartContainerError: ResourceInitializationError: failed to create new container runtime task: failed to create shim task: hcs::CreateComputeSystem abcdxyz: The container operating system does not match the host operating system".
To resolve this issue, verify that the AWS Fargate or Amazon Elastic Compute Cloud (Amazon EC2) host use the same OS as the container instance.
"Unable to assume the role" error
If the container instance can't assume the AWS Identity and Access Management (IAM) role, then you receive an error that looks similar to "Unable to assume the role "arn:aws:iam::abcdefxyz123:role/yyyyyyyy".
Run the following command to make sure that you set the -EnableTaskIAMRole option in the user data script:
<powershell>
Import-Module ECSTools
Initialize-ECSAgent -Cluster 'windows' -EnableTaskIAMRole
</powershell>
Make sure that you meet the Windows instance configuration requirements.
Related information
Bootstrapping Amazon ECS Windows container instances to pass data