在公共VPC/Subnet中出现无法拉取容器错误。我遗漏/做错了什么?

0

【以下的问题经过翻译处理】 我已经创建了一个全新的 AWS 账户(只是为了解决这个问题)并且这个账户中每个区域的默认 VPC 和子网都是原始的和未修改的。

这是 us-east-1 中的默认 VPC:

$ aws ec2 describe-vpcs
{
    "Vpcs": [
        {
            "CidrBlock": "172.31.0.0/16",
            "DhcpOptionsId": "dopt-095a7873b289557a1",
            "State": "available",
            "VpcId": "vpc-08ba51697a37c5ad9",
            "OwnerId": "...",
            "InstanceTenancy": "default",
            "CidrBlockAssociationSet": [
                {
                    "AssociationId": "vpc-cidr-assoc-0dba5df7b176877b7",
                    "CidrBlock": "172.31.0.0/16",
                    "CidrBlockState": {
                        "State": "associated"
                    }
                }
            ],
            "IsDefault": true
        }
    ]
}

这是此 VPC 的路由表:

$ aws ec2 describe-route-tables --filters Name=vpc-id,Values=vpc-08ba51697a37c5ad9
{
    "RouteTables": [
        {
            "Associations": [
                {
                    "Main": true,
                    "RouteTableAssociationId": "rtbassoc-08e6f9833f341f6c4",
                    "RouteTableId": "rtb-000d61d5d0236d276",
                    "AssociationState": {
                        "State": "associated"
                    }
                }
            ],
            "PropagatingVgws": [],
            "RouteTableId": "rtb-000d61d5d0236d276",
            "Routes": [
                {
                    "DestinationCidrBlock": "172.31.0.0/16",
                    "GatewayId": "local",
                    "Origin": "CreateRouteTable",
                    "State": "active"
                },
                {
                    "DestinationCidrBlock": "0.0.0.0/0",
                    "GatewayId": "igw-0b7ed209f5cd38fa6",
                    "Origin": "CreateRoute",
                    "State": "active"
                }
            ],
            "Tags": [],
            "VpcId": "vpc-08ba51697a37c5ad9",
            "OwnerId": "..."
        }
    ]
}

如您所见,第二条路由允许出口到互联网:

{
    "DestinationCidrBlock": "0.0.0.0/0",
    "GatewayId": "igw-0b7ed209f5cd38fa6",
    "Origin": "CreateRoute",
    "State": "active"
}

所以我假设如果我在此 VPC 中部署 ECS Fargate 任务,它应该能够从 docker.io 中提取 amazoncorretto:17-alpine3.15

尽管如此,每当我部署 CloudFormation 堆栈时,ECS 都无法运行计划任务,因为它无法从 DockerHub 获取图像并输出错误:

CannotPullContainerError: inspect image has been retried 5 time(s): failed >to resolve ref "docker.io/library/amazoncorretto:17-alpine3.15": failed to >do request: Head https://registry-1.docker.io/v2/library/amazoncorretto/manifests/17-alpine3.15: dial ...

这是我的 CloudFormation 模板(我有意为所有涉及的角色授予广泛的开放权限,以确保此问题不是由于 IAM 权限不足而造成的):

AWSTemplateFormatVersion: "2010-09-09"
Description: ECS Cron Task
Parameters:
  AppName:
    Type: String
    Default: CronTask

  AppImage:
    Type: String
    Default: amazoncorretto:17-alpine3.15

  AppLogGroup:
    Type: String
    Default: ECS

  AppLogPrefix:
    Type: String
    Default: CronTask

  ScheduledTaskSubnets:
    Type: List<AWS::EC2::Subnet::Id>
    Default: "subnet-0031a6eaf7e52173c, subnet-01950a0d2d1e04dc1, subnet-0a1aa70f0421e2025, subnet-036abb95995a86c73, subnet-0f8b5043babfb9a7e, subnet-07cb2210ce2d5bb8f"

Resources:
  Cluster:
    Type: AWS::ECS::Cluster

  TaskRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Action: sts:AssumeRole
            Effect: Allow
            Principal:
              Service: ecs-tasks.amazonaws.com
      Policies:
        - PolicyName: AdminAccess
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Action: "*"
                Effect: Allow
                Resource: "*"

  TaskExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Action: sts:AssumeRole
            Effect: Allow
            Principal:
              Service: ecs-tasks.amazonaws.com
      Path: /
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
      Policies:
        - PolicyName: AdminAccess
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Action: "*"
                Effect: Allow
                Resource: "*"

  TaskScheduleRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action: sts:AssumeRole
            Effect: Allow
            Principal:
              Service: events.amazonaws.com
      Path: /
      Policies:
        - PolicyName: AdminAccess
          PolicyDocument:
            Statement:
              - Action: "*"
                Effect: Allow
                Resource: "*"

  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Cpu: 256
      Memory: 512
      NetworkMode: awsvpc
      TaskRoleArn: !Ref TaskRole
      ExecutionRoleArn: !Ref TaskExecutionRole
      Family: !Ref AppName
      RequiresCompatibilities:
        - FARGATE
      ContainerDefinitions:
        - Name: !Ref AppName
          Image: !Ref AppImage
          Command: ["java", "--version"]
          Essential: true
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-create-group: true
              awslogs-group: !Ref AppLogGroup
              awslogs-region: !Ref "AWS::Region"
              awslogs-stream-prefix: !Ref AppLogPrefix

  TaskSchedule:
    Type: AWS::Events::Rule
    DependsOn: 
      - TaskScheduleRole
      - DeadLetterQueue
    Properties:
      Description: Trigger the task once every minute
      ScheduleExpression: cron(0/1 * * * ? *)
      State: ENABLED
      Targets:
        - Arn: !GetAtt Cluster.Arn
          Id: ClusterTarget
          RoleArn: !GetAtt TaskScheduleRole.Arn
          DeadLetterConfig:
            Arn: !GetAtt DeadLetterQueue.Arn
          EcsParameters:
            LaunchType: FARGATE
            TaskCount: 1
            TaskDefinitionArn: !Ref TaskDefinition
            NetworkConfiguration:
              AwsVpcConfiguration:
                Subnets: !Ref ScheduledTaskSubnets

  DeadLetterQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: "CronTaskDeadLetterQueue"

  DeadLetterQueuePolicy:
    Type: AWS::SQS::QueuePolicy
    Properties:
      Queues:
        - !Ref DeadLetterQueue
      PolicyDocument:
        Statement:
          - Action: "*"
            Effect: Allow
            Resource: "*"

我在这里错过了什么?为什么尽管在公共子网/VPC 中运行任务(如下),AWS 仍无法从 docker.io 中提取图像?我的 TaskSchedule 资源中是否缺少某些内容?

TaskSchedule:
    Type: AWS::Events::Rule
    ...
    Properties:
        ...
        Targets:
            - ...
                EcsParameters:
                LaunchType: FARGATE
                TaskCount: 1
                TaskDefinitionArn: !Ref TaskDefinition
                NetworkConfiguration:
                    AwsVpcConfiguration:
                        Subnets: !Ref ScheduledTaskSubnets

提前致谢。

1 回答
0

【以下的回答经过翻译处理】 您的任务由于未被分配公共IP地址而无法与Internet通信。您需要在您的AwsVpcConfiguration中添加AssignPublicIp:ENABLED。详见https://docs.aws.amazon.com/eventbridge/latest/APIReference/API_AwsVpcConfiguration.html

profile picture
专家
已回答 5 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则