Lambda function cannot run ECS task: unable to retrieve ecr registry auth

0

I have run up and torn down a few ECS clusters whilst figuring out how to use them. My latest cluster has a task which runs fine when called from the ECS console "run task" option; however the task will not run if I try to run it using a python lambda function.

My lambda code:

import boto3

def lambda_handler(event, context):

    ecs = boto3.client('ecs')

    response = ecs.run_task(
        cluster='lighthouse-run-cluster',
        taskDefinition='lighthouse-run-task-definition:5',
        launchType='FARGATE',
        networkConfiguration={
            'awsvpcConfiguration': {
                'subnets': [...],
                'securityGroups': [...]
            }
        }
    )

The function completes successfully, however the ECS task does not. The task sits in PENDING for some time then stops, with the stopped reason:

ResourceInitializationError: unable to pull secrets or registry auth:
execution resource retrieval failed: unable to retrieve ecr registry auth:
service call has been retried 3 time(s):
RequestError: send request failed caused by:
Post https://api.ecr.eu-west-2.amazonaws.com/: dial tcp 52.94.49.106:443: i/o timeout

I'm guessing this is some cruft left over from a previous cluster, but I'm not sure how to find the problem. Can anyone help?

질문됨 2년 전1509회 조회
2개 답변
1

Hi Simon,

It looks like this is either an IAM permission or a security group permission. I'd recommend making sure that your ecsTaskExecutionRole has the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

If that doesn't work, try adding those same permissions to your Lambda function's role as well.

You may also need to add this permission to your ecsTaskRole:

{
            "Action": "ecr:GetAuthorizationToken",
            "Effect": "Allow",
            "Resource": "*"
}

If those fail, try checking the security group associated with your Lambda function to make sure it can talk over 443, and if that fails, then check the permissions associated with your ECR repository:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowCrossAccountPullTest",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::ACCOUNT_ID:role/ecsTaskExecutionRoleNAME",
        ]
      },
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer"
      ]
    }
  ]
}

If all else fails, check out this article here that appears to be similar to your issue: https://stackoverflow.com/questions/61265108/aws-ecs-fargate-resourceinitializationerror-unable-to-pull-secrets-or-registry

Looks like it could be a networking issue with how you're launching your tasks if they don't have internet accessibility to ECR. Hope this helps!

답변함 2년 전
  • None of this made any difference. Thanks though.

1

I fixed this by including 'assignPublicIp': 'ENABLED' to the networkConfiguration:

    response = ecs.run_task(
        cluster='lighthouse-run-cluster',
        taskDefinition='lighthouse-run-task-definition:5',
        launchType='FARGATE',
        networkConfiguration={
            'awsvpcConfiguration': {
                'subnets': [...],
                'securityGroups': [...],
                #
                # Added the flag below
                #
                'assignPublicIp': 'ENABLED'
            }
        }
    )

I think this is down to a config problem in our sandbox and it shouldn't be needed in production, but it got me moving.

답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠