I received one of the following errors when I launched an Amazon Elastic Container Service (Amazon ECS) task: "ResourceInitializationError: unable to pull secrets or registry auth: pull command failed: : signal: killed" or "ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried."
The AWS Fargate platform version 1.4.0 uses the task elastic network interface to pull the image and secrets. All network traffic flows through the elastic network interface within your Amazon Virtual Private Cloud (Amazon VPC). You can view this traffic through your Amazon VPC Flow Logs. However, the task uses your network configuration instead of using the elastic network interfaces that Fargate owns. This is because the elastic network interfaces are placed within your Amazon VPC.
The Amazon ECS container agent uses the task execution AWS Identity and Access Management (IAM) role to get information from the following services:
- AWS Systems Manager Parameter Store
- AWS Secrets Manager
If you encrypt data using a customer managed AWS Key Management Service (AWS KMS) key, then grant the following permissions to the task execution IAM role:
Use the AWSSupport-TroubleshootECSTaskFailedToStart runbook to troubleshoot the Amazon ECS tasks that fail to start. This automation reviews the following configurations:
- Network connectivity to the configured container registry
- Missing IAM permissions required by the task execution role
- Virtual private cloud (VPC) endpoint connectivity
- Security group rule configuration
- AWS Secrets Manager secrets references
- Logging configuration
If the runbook's output doesn't provide recommendations, then use the manual troubleshooting approaches in the following section.
- Use the runbook in the same AWS Region where your ECS Cluster resources are located.
- When using the runbook, you must use the most recently failed Task ID. If the failed task is part of the Amazon ECS service, then use the most recently failed task in the service. The failed task must be visible in ECS:DescribeTasks during automation execution. By default, stopped ECS tasks are visible for 1 hour after entering the Stopped state. Using the most recently failed task ID prevents the task state cleanup from interrupting the analysis during the automation.
To run the AWSSupport-TroubleshootECSTaskFailedToStart runbook, complete the following steps:
- Open the AWS Systems Manager console.
- In the navigation pane, under Change Management, choose Automation.
- Choose Execute automation.
- Choose the Owned by Amazon tab.
- Under Automation document, search for TroubleshootECSTaskFailedToStart.
- Select the AWSSupport-TroubleshootECSTaskFailedToStart card.
Note: Make sure that you select the radio button on the card and not the hyperlinked automation name.
- Choose Next.
Note: After execution, analysis results are populated in the Global output section. However, wait for the status of the document to move to Success. Also, watch for any exceptions in the Output section.
- For Execute automation document, choose Simple execution.
- In the Input parameters section, for AutomationAssumeRole, enter the ARN of the role that allows Systems Manager Automation to perform actions.
Note: Be sure that either the AutomationAssumeRole or the IAM user or role has the IAM required permissions to run the AWSSupport-TroubleshootECSTaskFailedToStart runbook. If you don't specify an IAM role, then Systems Manager Automation uses the permissions of the IAM user or role that runs the runbook. For more information about creating the assume role for Systems Manager Automation, see Task 1: Create a service role for Automation.
- For ClusterName, enter the cluster name where the task failed to start.
- For TaskId, enter the identification for the task that most recently failed.
- Choose Execute.
Based on the output of the automation, use one of the following manual troubleshooting steps.
Check the routes from your subnets to the internet
If you have a Fargate task in a public subnet, then verify that your task has an assigned public IP address. Also, confirm that the task has a default route (0.0.0.0/0) to an internet gateway. When you launch a new task or create a new service, turn on Auto-assign public.
If you use the following configurations, then don't use the internet gateway in the public subnet to reach the Secrets Manager or Systems Manager. Instead, use an Amazon VPC endpoint:
- The Secrets Manager or Systems Manager VPC endpoints are in a public subnet.
- You turned on AmazonProvidedDNS in your Amazon VPC DHCP settings.
Note: You can't turn on Auto-assign public for existing tasks. For existing services, you can use only the AWS Command Line Interface (AWS CLI) to reconfigure the services. You can't use the AWS Management Console. If you used an AWS CloudFormation stack to create your Amazon ECS service, then modify the NetworkConfiguration property AWS::ECS::Service to update the service.
If you have a Fargate task in a private subnet, then verify that your task has a default route (0.0.0.0/0) to the internet connectivity source. The internet connectivity source can be a NAT gateway, AWS PrivateLink, or other source.
- If you use a NAT gateway, then place your NAT gateway in a public subnet. For more information, see Architecture with an internet gateway and a NAT gateway.
- If you use PrivateLink, then be sure that your Fargate infrastructure can use the security groups for your Amazon VPC endpoints.
- If you use a custom name domain server, then confirm the DNS query's settings. The query must have outbound access on port 53 using UDP and TCP protocol. Also, it must have HTTPS access on port 443.
Check your network ACL and security group settings
Verify that your network access control list (network ACL) and security groups don't block outbound access to port 443 from the subnet. For more information, see Control traffic to resources using security groups.
Note: Fargate tasks must have outbound access to port 443 to allow outgoing traffic and access Amazon ECS endpoints.
Check your Amazon VPC endpoints
If you use PrivateLink, then you must create the required endpoints.
The following endpoints are required for Fargate platform versions 1.4.0 or later:
- S3 gateway endpoint
For more information, see Considerations for Amazon ECR VPC endpoints.
Note: If your task definition uses Secrets Manager, Systems Manager parameters, or Amazon CloudWatch Logs, then you might need to define endpoints. For more information, see the following documentation:
If you use PrivateLink, then check that Amazon VPC's security group allows the correct traffic. The group must allow traffic from the Fargate task security group or Fargate task VPC CIDR range on TCP port 443.
To confirm that the Fargate infrastructure has service access, check the VPC endpoint policies and endpoint policies for Amazon Simple Storage Solution (Amazon S3).
Check your IAM roles and permissions
The task execution role grants the required permissions to the Amazon ECS container and Fargate agents to make API calls for the task. Fargate requires this role when you take the following actions:
- Pull a container image from Amazon Elastic Container Registry (Amazon ECR).
- Use the awslogs log driver.
- Use private registry authentication.
- Use Secrets Manager secrets or Systems Manager Parameter Store parameters to reference sensitive data.
If your use case involves any of the preceding scenarios, then define the required permissions in your task execution role. For a complete list of required permissions, see Amazon ECS task execution IAM role.
Check the referenced sensitive information in the Amazon ECS task definition
Check if the secret and parameter names match the referenced names in your Amazon ECS task definition. Then, check if the values in the container definition in your task definition match the values in your Amazon ECS task definition. For more information, see How can I pass secrets or sensitive information securely to containers in an Amazon ECS task?
If the Systems Manager Parameter Store parameter and task are in the same Region, then use the full ARN or the name of the secret. If the parameter exists in a different Region, then you must specify the full ARN.
To check the Systems Manager parameter name and ARN, complete the following steps:
Note: If you receive errors when running AWS CLI commands, make sure that you're using the most recent version of the AWS CLI.
- Open the AWS Systems Manager console.
- In the navigation pane, choose Parameter Store, and then confirm your Parameter Store name.
- To get the parameter's ARN, use the AWS CLI to run the following command. Replace name_of_parameter_store_secret with your Parameter Store secret name.
$ aws ssm get-parameter —name <name_of_parameter_store_secret> —with-decryption
Note: Parameters that reference Secrets Manager secrets can't use the Parameter Store versioning or history features. For more information, see Restrictions.
Checking stopped tasks for errors