Lambda function cannot run ECS task: unable to retrieve ecr registry auth

0

I have run up and torn down a few ECS clusters whilst figuring out how to use them. My latest cluster has a task which runs fine when called from the ECS console "run task" option; however the task will not run if I try to run it using a python lambda function.

My lambda code:

import boto3

def lambda_handler(event, context):

    ecs = boto3.client('ecs')

    response = ecs.run_task(
        cluster='lighthouse-run-cluster',
        taskDefinition='lighthouse-run-task-definition:5',
        launchType='FARGATE',
        networkConfiguration={
            'awsvpcConfiguration': {
                'subnets': [...],
                'securityGroups': [...]
            }
        }
    )

The function completes successfully, however the ECS task does not. The task sits in PENDING for some time then stops, with the stopped reason:

ResourceInitializationError: unable to pull secrets or registry auth:
execution resource retrieval failed: unable to retrieve ecr registry auth:
service call has been retried 3 time(s):
RequestError: send request failed caused by:
Post https://api.ecr.eu-west-2.amazonaws.com/: dial tcp 52.94.49.106:443: i/o timeout

I'm guessing this is some cruft left over from a previous cluster, but I'm not sure how to find the problem. Can anyone help?

已提問 2 年前檢視次數 1566 次
2 個答案
1

Hi Simon,

It looks like this is either an IAM permission or a security group permission. I'd recommend making sure that your ecsTaskExecutionRole has the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

If that doesn't work, try adding those same permissions to your Lambda function's role as well.

You may also need to add this permission to your ecsTaskRole:

{
            "Action": "ecr:GetAuthorizationToken",
            "Effect": "Allow",
            "Resource": "*"
}

If those fail, try checking the security group associated with your Lambda function to make sure it can talk over 443, and if that fails, then check the permissions associated with your ECR repository:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowCrossAccountPullTest",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::ACCOUNT_ID:role/ecsTaskExecutionRoleNAME",
        ]
      },
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer"
      ]
    }
  ]
}

If all else fails, check out this article here that appears to be similar to your issue: https://stackoverflow.com/questions/61265108/aws-ecs-fargate-resourceinitializationerror-unable-to-pull-secrets-or-registry

Looks like it could be a networking issue with how you're launching your tasks if they don't have internet accessibility to ECR. Hope this helps!

已回答 2 年前
  • None of this made any difference. Thanks though.

1

I fixed this by including 'assignPublicIp': 'ENABLED' to the networkConfiguration:

    response = ecs.run_task(
        cluster='lighthouse-run-cluster',
        taskDefinition='lighthouse-run-task-definition:5',
        launchType='FARGATE',
        networkConfiguration={
            'awsvpcConfiguration': {
                'subnets': [...],
                'securityGroups': [...],
                #
                # Added the flag below
                #
                'assignPublicIp': 'ENABLED'
            }
        }
    )

I think this is down to a config problem in our sandbox and it shouldn't be needed in production, but it got me moving.

已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南