How do I resolve a "ResourceInitializationError" when I try to pull secrets or retrieve Amazon ECR authentication for ECS tasks?

9 minute read
0

When I launch an Amazon Elastic Container Service (Amazon ECS) task, I receive a ResourceInitializationError message.

Short description

When you launch an Amazon ECS task on Fargate launch type, you might receive one of the following error messages:

  • "ResourceInitializationError: unable to pull secrets or registry auth: pull command failed: : signal: killed"
  • "ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried."
  • "ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr..amazonaws.com/": dial tcp …443: i/o timeout. Please check your task network configuration."
  • "ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried 5 time(s): failed to fetch secret arn:aws:secretsmanager…"
  • "ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried 1 time(s): failed to fetch secret arn:aws:secretsmanager:<region>:<accountID>:secret:<secretName> from secrets manager: InvalidParameter: 1 validation error(s) found. – (minimum field size of 32/ maximum field size of 64), GetSecretValueInput.VersionId."
  • "ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried 1 time(s): failed to fetch secret rn:aws:secretsmanager:<region>:<accountID>:secret:<secretName> from secrets manager: AccessDeniedException: User: arn:aws:sts::<accountID>::assumed-role/<roleName> is not authorized to perform: secretsmanager:GetSecretValue on resource: rn:aws:secretsmanager:<region>:<accountID>:secret:<secretName> because no identity-based policy allows the secretsmanager:GetSecretValue action status code: 400"

AWS Fargate version 1.4.0 uses the task elastic network interface to pull the image and secrets. All network traffic flows through the network interface within your Amazon Virtual Private Cloud (Amazon VPC). You can use VPC Flow Logs to view the traffic. However, the task uses your network configuration because Fargate puts the network interfaces in your Amazon VPC.

The Amazon ECS container agent uses the task execution AWS Identity and Access Management (IAM) role to get information from the Parameter Store, a capability of AWS Systems Manager, and AWS Secrets Manager.

For data that you encrypt with a customer managed AWS Key Management Service (AWS KMS) key, grant the following permissions to the task execution IAM role:

  • ssm:GetParameters
  • secretsmanager:GetSecretValue
  • kms:Decrypt

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

Use the TroubleshootECSTaskFailedToStart runbook

Use the AWSSupport-TroubleshootECSTaskFailedToStart runbook to troubleshoot the Amazon ECS tasks that fail to start.

Important: Use the runbook in the same AWS Region where your ECS cluster resources are located. Use the most recently failed task ID so that the task state cleanup doesn't interrupt the analysis. If the failed task is part of the Amazon ECS service, then use the most recently failed task in the service. The failed task must be visible in ECS:DescribeTasks when you run Automation. By default, stopped ECS tasks are visible for 1 hour after the tasks enter the Stopped state.

To run the AWSSupport-TroubleshootECSTaskFailedToStart runbook, complete the following steps:

  1. Open the AWS Systems Manager console.
  2. In the navigation pane, under Change Management, choose Automation.
  3. Choose Execute automation.
  4. Choose the Owned by Amazon tab.
  5. Under Automation document, enter TroubleshootECSTaskFailedToStart in the search bar.
  6. Select the AWSSupport-TroubleshootECSTaskFailedToStart card.
    Note: Don't select the hyperlinked automation name.
  7. Choose Next.
  8. For Execute automation document, choose Simple execution.
  9. In the Input parameters section, for AutomationAssumeRole, enter the ARN of the role that allows Automation to perform actions.
    Note: Be sure that either the service role or the IAM user or role has the required IAM permissions to run the AWSSupport-TroubleshootECSTaskFailedToStart runbook. If you don't specify an IAM role, then Automation uses the permissions of the IAM user or role that runs the runbook. For information about how to create the service role for Automation, see Task 1: Create a service role for Automation.
  10. For ClusterName, enter the name of the cluster where the task failed to start.
  11. For TaskId, enter the identification for the task that most recently failed.
  12. Choose Execute.
    Note: After execution, the analysis results are populated in the Global output section. However, wait for the document status to move to Success. Also, review the Output section for any exceptions.

You can also manually troubleshoot the issue.

Check the routes from your subnets to the internet

If your Fargate task is in a public subnet, then verify that you assigned a public IP address to the task. Also, confirm that the task has a default route (0.0.0.0/0) to an internet gateway. When you launch a new task or create a new service, turn on Auto-assign public.

If you used an AWS CloudFormation stack to create your Amazon ECS service, then modify the NetworkConfiguration property for AWS::ECS::Service to update the service. To update the configuration for existing services, use CloudFormation to turn on the AssignPublicIp parameter. Or, run the following update-service AWS CLI command:

aws ecs update-service --service serviceName --region regionName "awsvpcConfiguration={subnets=[subnet-123,subnet-456],securityGroups=[sg-123,sg-456],assignPublicIp=ENABLED}"

Note: Replace regionName with your Region.

If you use the following configurations, then don't use the internet gateway in the public subnet to reach the Secrets Manager or Systems Manager:

  • The Secrets Manager or Systems Manager VPC endpoints are in a public subnet.
  • You turned on AmazonProvidedDNS in your Amazon VPC DHCP settings.

Instead, use an Amazon VPC endpoint.

Note: You can't turn on Auto-assign public for existing tasks. To reconfigure existing services, use the AWS CLI and not the AWS Management Console.

If your Fargate task is in a private subnet, then verify that your task has a default route (0.0.0.0/0) to the internet connectivity source.

The internet connectivity source can be a NAT gateway, AWS PrivateLink, or custom name domain server.

If you use a NAT gateway, then put your NAT gateway in a public subnet. For more information, see Architecture with an internet gateway and a NAT gateway using AWS Network Firewall. If you use PrivateLink, then verify that the Amazon VPC endpoints' security groups allow traffic to the Fargate tasks. If you use a custom name domain server, then confirm the DNS query's settings. The query must have outbound access on port 53, and use UDP and TCP protocol. The query must also have HTTPS access on port 443.

Check your network ACL and security group settings

Verify that your network access control list (network ACL) and security groups don't block outbound access to port 443 from the subnet. For more information, see Control traffic to your AWS resources using security groups.

Note: Fargate tasks must have outbound access to port 443 to allow outgoing traffic and access Amazon ECS endpoints.

Check your Amazon VPC endpoints

If you use PrivateLink, then you must create the following required endpoints for Fargate platform versions 1.4.0 or later:

  • com.amazonaws.region.ecr.dkr
  • com.amazonaws.region.ecr.api
  • S3 gateway endpoint
  • com.amazonaws.region.logs

For more information, see Considerations for Amazon Elastic Container Registry (Amazon ECR) VPC endpoints.

Note: If your task definition uses Secrets Manager, Parameter Store, or Amazon CloudWatch Logs, then make sure that you define endpoints. For more information, see Using an AWS Secrets Manager VPC endpoint and Creating the VPC endpoints for Amazon ECS.

For PrivateLink, check that the Amazon VPC security group allows traffic from the Fargate task security group or VPC CIDR range on TCP port 443.

To confirm that the Fargate infrastructure has service access, check the VPC endpoint policies and Amazon Simple Storage Service (Amazon S3) gateway endpoint policies.

Check your IAM roles and permissions

The task execution role grants the required permissions to the Amazon ECS container and Fargate agents to make API calls for the task.

Fargate requires the task execution role when you take the following actions:

  • Pull a container image from Amazon ECR.
  • Use the awslogs log driver.
  • Use private registry authentication.
  • Use Secrets Manager secrets or Parameter Store parameters to reference sensitive data.

In the preceding scenarios, define the required permissions in your task execution role. When you access Secrets Manager secrets or Parameter Store parameters to retrieve the sensitive data, confirm that you have the required secretsmanager:GetSecretValue or ssm:GetParameters permissions. For a list of required permissions, see Secrets Manager or Systems Manager permissions.

Check the sensitive data in the Amazon ECS task definition

Check that the secret and parameter names match the referenced names in your Amazon ECS task definition. Then, check that the values in the container definition match the values in your Amazon ECS task definition. For more information, see How can I securely pass secrets or sensitive information to containers in an Amazon ECS task?

Make sure that you configure the secret or parameter with the same ARN or name that's specified in the Task Definition. If the resource exists in a different Region, then you must provide the full ARN.

Use the VersionId parameter within GetSecretValueInput to specify the version of the secret value that's retrieved. If you don't require a specific version, then delete the VersionId field. Secrets Manager retrieves the latest version by default.

If the Parameter Store parameter and task are in the same Region, then use the full ARN or the name of the secret. If the parameter exists in a different Region, then you must specify the full ARN.

To check the parameter name and ARN, complete the following steps:

  1. Open the AWS Systems Manager console.
  2. In the navigation pane, choose Parameter Store, and then confirm your Parameter Store name.
  3. To get the parameter's ARN, run the following get-parameter AWS CLI command:
    aws ssm get-parameter --name name_of_parameter_store_secret --with-decryption
    Note: Replace name_of_parameter_store_secret with your Parameter Store secret name. Parameters that reference Secrets Manager secrets can't use the Parameter Store version or history features. For more information, see Restrictions.

Related information

Viewing Amazon ECS stopped task errors

Amazon ECS task networking options for the Fargate launch type

Amazon ECR interface VPC endpoints (AWS PrivateLink)

Using CloudWatch Logs with interface VPC endpoints

4 Comments

It amazes me how poor the product-market-fit for Fargate ECS (and Batch) tasks is. The concept is to provide users with a lightweight ability to run a docker container, without the need for an EC2 fleet, infrastructure, networking etc. (I quote from the AWS landing page: "Deploy and manage your applications, not infrastructure.")

Now, when using the service, the user must go through a cascade of infrastructure configurations: ECR, ECS clusters, ECS task definitions, ECS task create/run definition, VPCs, subnets, security groups... None of them are optional, but all are core to making Fargate run. If the user does not manipulate the routing table, you can't pull your private docker image from ECR - unless using a "globally public IP". Really? And things get even more complex when running "Batch" jobs.

It's astonishing how the value proposition has grown into a user-unfriendly journey that requries customers to hire network admins. :( Now, being tech-savvy, I understand that the above complexity might be required, even desired, in advanced use cases. But you have abandonded the entry-level users: If running a "Hello, world" example in a private (secure) environment from a private docker image takes 100 configuration steps, rather than following a 3-step launch wizard, you lose me. (Maybe a friendly analogy would be: I currently feel like installing a printer on Windows 3.1, troubleshooting whether the serial port has been connected while the BIOS loaded...)

replied 2 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 2 years ago

The "AWSSupport-TroubleshootECSTaskFailedToStart" link is broken. https://awssupport-troubleshootecstaskfailedtostart/ is not a valid URL. So frustrating that something so complicated as getting a simple (in concept) ECS task running only to have docs that fail. +1 to the first comment above about the overwhelming complexity of doing something conceptually very simple.

replied a year ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
EXPERT
replied a year ago