Lambda function cannot run ECS task: unable to retrieve ecr registry auth

0

I have run up and torn down a few ECS clusters whilst figuring out how to use them. My latest cluster has a task which runs fine when called from the ECS console "run task" option; however the task will not run if I try to run it using a python lambda function.

My lambda code:

import boto3

def lambda_handler(event, context):

    ecs = boto3.client('ecs')

    response = ecs.run_task(
        cluster='lighthouse-run-cluster',
        taskDefinition='lighthouse-run-task-definition:5',
        launchType='FARGATE',
        networkConfiguration={
            'awsvpcConfiguration': {
                'subnets': [...],
                'securityGroups': [...]
            }
        }
    )

The function completes successfully, however the ECS task does not. The task sits in PENDING for some time then stops, with the stopped reason:

ResourceInitializationError: unable to pull secrets or registry auth:
execution resource retrieval failed: unable to retrieve ecr registry auth:
service call has been retried 3 time(s):
RequestError: send request failed caused by:
Post https://api.ecr.eu-west-2.amazonaws.com/: dial tcp 52.94.49.106:443: i/o timeout

I'm guessing this is some cruft left over from a previous cluster, but I'm not sure how to find the problem. Can anyone help?

質問済み 2年前1504ビュー
2回答
1

Hi Simon,

It looks like this is either an IAM permission or a security group permission. I'd recommend making sure that your ecsTaskExecutionRole has the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

If that doesn't work, try adding those same permissions to your Lambda function's role as well.

You may also need to add this permission to your ecsTaskRole:

{
            "Action": "ecr:GetAuthorizationToken",
            "Effect": "Allow",
            "Resource": "*"
}

If those fail, try checking the security group associated with your Lambda function to make sure it can talk over 443, and if that fails, then check the permissions associated with your ECR repository:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowCrossAccountPullTest",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::ACCOUNT_ID:role/ecsTaskExecutionRoleNAME",
        ]
      },
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer"
      ]
    }
  ]
}

If all else fails, check out this article here that appears to be similar to your issue: https://stackoverflow.com/questions/61265108/aws-ecs-fargate-resourceinitializationerror-unable-to-pull-secrets-or-registry

Looks like it could be a networking issue with how you're launching your tasks if they don't have internet accessibility to ECR. Hope this helps!

回答済み 2年前
  • None of this made any difference. Thanks though.

1

I fixed this by including 'assignPublicIp': 'ENABLED' to the networkConfiguration:

    response = ecs.run_task(
        cluster='lighthouse-run-cluster',
        taskDefinition='lighthouse-run-task-definition:5',
        launchType='FARGATE',
        networkConfiguration={
            'awsvpcConfiguration': {
                'subnets': [...],
                'securityGroups': [...],
                #
                # Added the flag below
                #
                'assignPublicIp': 'ENABLED'
            }
        }
    )

I think this is down to a config problem in our sandbox and it shouldn't be needed in production, but it got me moving.

回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ