Short description
When you launch an Amazon ECS task on Fargate launch type, you might receive one of the following error messages:
- "ResourceInitializationError: unable to pull secrets or registry auth: pull command failed: : signal: killed"
- "ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried."
- "ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr..amazonaws.com/": dial tcp …443: i/o timeout. Please check your task network configuration."
- "ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried 5 time(s): failed to fetch secret arn:aws:secretsmanager…"
- "ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried 1 time(s): failed to fetch secret arn:aws:secretsmanager:<region>:<accountID>:secret:<secretName> from secrets manager: InvalidParameter: 1 validation error(s) found. – (minimum field size of 32/ maximum field size of 64), GetSecretValueInput.VersionId."
- "ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried 1 time(s): failed to fetch secret rn:aws:secretsmanager:<region>:<accountID>:secret:<secretName> from secrets manager: AccessDeniedException: User: arn:aws:sts::<accountID>::assumed-role/<roleName> is not authorized to perform: secretsmanager:GetSecretValue on resource: rn:aws:secretsmanager:<region>:<accountID>:secret:<secretName> because no identity-based policy allows the secretsmanager:GetSecretValue action status code: 400"
AWS Fargate version 1.4.0 uses the task elastic network interface to pull the image and secrets. All network traffic flows through the network interface within your Amazon Virtual Private Cloud (Amazon VPC). You can use VPC Flow Logs to view the traffic. However, the task uses your network configuration because Fargate puts the network interfaces in your Amazon VPC.
The Amazon ECS container agent uses the task execution AWS Identity and Access Management (IAM) role to get information from the Parameter Store, a capability of AWS Systems Manager, and AWS Secrets Manager.
For data that you encrypt with a customer managed AWS Key Management Service (AWS KMS) key, grant the following permissions to the task execution IAM role:
- ssm:GetParameters
- secretsmanager:GetSecretValue
- kms:Decrypt
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Use the TroubleshootECSTaskFailedToStart runbook
Use the AWSSupport-TroubleshootECSTaskFailedToStart runbook to troubleshoot the Amazon ECS tasks that fail to start.
Important: Use the runbook in the same AWS Region where your ECS cluster resources are located. Use the most recently failed task ID so that the task state cleanup doesn't interrupt the analysis. If the failed task is part of the Amazon ECS service, then use the most recently failed task in the service. The failed task must be visible in ECS:DescribeTasks when you run Automation. By default, stopped ECS tasks are visible for 1 hour after the tasks enter the Stopped state.
To run the AWSSupport-TroubleshootECSTaskFailedToStart runbook, complete the following steps:
- Open the AWS Systems Manager console.
- In the navigation pane, under Change Management, choose Automation.
- Choose Execute automation.
- Choose the Owned by Amazon tab.
- Under Automation document, enter TroubleshootECSTaskFailedToStart in the search bar.
- Select the AWSSupport-TroubleshootECSTaskFailedToStart card.
Note: Don't select the hyperlinked automation name.
- Choose Next.
- For Execute automation document, choose Simple execution.
- In the Input parameters section, for AutomationAssumeRole, enter the ARN of the role that allows Automation to perform actions.
Note: Be sure that either the service role or the IAM user or role has the required IAM permissions to run the AWSSupport-TroubleshootECSTaskFailedToStart runbook. If you don't specify an IAM role, then Automation uses the permissions of the IAM user or role that runs the runbook. For information about how to create the service role for Automation, see Task 1: Create a service role for Automation.
- For ClusterName, enter the name of the cluster where the task failed to start.
- For TaskId, enter the identification for the task that most recently failed.
- Choose Execute.
Note: After execution, the analysis results are populated in the Global output section. However, wait for the document status to move to Success. Also, review the Output section for any exceptions.
You can also manually troubleshoot the issue.
Check the routes from your subnets to the internet
If your Fargate task is in a public subnet, then verify that you assigned a public IP address to the task. Also, confirm that the task has a default route (0.0.0.0/0) to an internet gateway. When you launch a new task or create a new service, turn on Auto-assign public.
If you used an AWS CloudFormation stack to create your Amazon ECS service, then modify the NetworkConfiguration property for AWS::ECS::Service to update the service. To update the configuration for existing services, use CloudFormation to turn on the AssignPublicIp parameter. Or, run the following update-service AWS CLI command:
aws ecs update-service --service serviceName --region regionName "awsvpcConfiguration={subnets=[subnet-123,subnet-456],securityGroups=[sg-123,sg-456],assignPublicIp=ENABLED}"
Note: Replace regionName with your Region.
If you use the following configurations, then don't use the internet gateway in the public subnet to reach the Secrets Manager or Systems Manager:
- The Secrets Manager or Systems Manager VPC endpoints are in a public subnet.
- You turned on AmazonProvidedDNS in your Amazon VPC DHCP settings.
Instead, use an Amazon VPC endpoint.
Note: You can't turn on Auto-assign public for existing tasks. To reconfigure existing services, use the AWS CLI and not the AWS Management Console.
If your Fargate task is in a private subnet, then verify that your task has a default route (0.0.0.0/0) to the internet connectivity source.
The internet connectivity source can be a NAT gateway, AWS PrivateLink, or custom name domain server.
If you use a NAT gateway, then put your NAT gateway in a public subnet. For more information, see Architecture with an internet gateway and a NAT gateway using AWS Network Firewall. If you use PrivateLink, then verify that the Amazon VPC endpoints' security groups allow traffic to the Fargate tasks. If you use a custom name domain server, then confirm the DNS query's settings. The query must have outbound access on port 53, and use UDP and TCP protocol. The query must also have HTTPS access on port 443.
Check your network ACL and security group settings
Verify that your network access control list (network ACL) and security groups don't block outbound access to port 443 from the subnet. For more information, see Control traffic to your AWS resources using security groups.
Note: Fargate tasks must have outbound access to port 443 to allow outgoing traffic and access Amazon ECS endpoints.
Check your Amazon VPC endpoints
If you use PrivateLink, then you must create the following required endpoints for Fargate platform versions 1.4.0 or later:
- com.amazonaws.region.ecr.dkr
- com.amazonaws.region.ecr.api
- S3 gateway endpoint
- com.amazonaws.region.logs
For more information, see Considerations for Amazon Elastic Container Registry (Amazon ECR) VPC endpoints.
Note: If your task definition uses Secrets Manager, Parameter Store, or Amazon CloudWatch Logs, then make sure that you define endpoints. For more information, see Using an AWS Secrets Manager VPC endpoint and Creating the VPC endpoints for Amazon ECS.
For PrivateLink, check that the Amazon VPC security group allows traffic from the Fargate task security group or VPC CIDR range on TCP port 443.
To confirm that the Fargate infrastructure has service access, check the VPC endpoint policies and Amazon Simple Storage Service (Amazon S3) gateway endpoint policies.
Check your IAM roles and permissions
The task execution role grants the required permissions to the Amazon ECS container and Fargate agents to make API calls for the task.
Fargate requires the task execution role when you take the following actions:
- Pull a container image from Amazon ECR.
- Use the awslogs log driver.
- Use private registry authentication.
- Use Secrets Manager secrets or Parameter Store parameters to reference sensitive data.
In the preceding scenarios, define the required permissions in your task execution role. When you access Secrets Manager secrets or Parameter Store parameters to retrieve the sensitive data, confirm that you have the required secretsmanager:GetSecretValue or ssm:GetParameters permissions. For a list of required permissions, see Secrets Manager or Systems Manager permissions.
Check the sensitive data in the Amazon ECS task definition
Check that the secret and parameter names match the referenced names in your Amazon ECS task definition. Then, check that the values in the container definition match the values in your Amazon ECS task definition. For more information, see How can I securely pass secrets or sensitive information to containers in an Amazon ECS task?
Make sure that you configure the secret or parameter with the same ARN or name that's specified in the Task Definition. If the resource exists in a different Region, then you must provide the full ARN.
Use the VersionId parameter within GetSecretValueInput to specify the version of the secret value that's retrieved. If you don't require a specific version, then delete the VersionId field. Secrets Manager retrieves the latest version by default.
If the Parameter Store parameter and task are in the same Region, then use the full ARN or the name of the secret. If the parameter exists in a different Region, then you must specify the full ARN.
To check the parameter name and ARN, complete the following steps:
- Open the AWS Systems Manager console.
- In the navigation pane, choose Parameter Store, and then confirm your Parameter Store name.
- To get the parameter's ARN, run the following get-parameter AWS CLI command:
aws ssm get-parameter --name name_of_parameter_store_secret --with-decryption
Note: Replace name_of_parameter_store_secret with your Parameter Store secret name. Parameters that reference Secrets Manager secrets can't use the Parameter Store version or history features. For more information, see Restrictions.
Related information
Viewing Amazon ECS stopped task errors
Amazon ECS task networking options for the Fargate launch type
Amazon ECR interface VPC endpoints (AWS PrivateLink)
Using CloudWatch Logs with interface VPC endpoints