ResourceInitializationError when running a job in AWS Batch

0

I've created a docker image, pushed it into a private ECR Repository, and configured an AWS Batch cluster/queue/job definition. When I submit a job, it immediately goes to the STARTING state, and then fails with

ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval
failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s):
RequestError: send request failed caused by: Post https://api.ecr.us-west-2.amazonaws.com/: dial
tcp 54.240.255.116:443: i/o timeout

This seems to be a problem with the container image not being pulled. My cluster has the following specs:

  • Fargate provision model
  • Default VPC
  • Default security group (allows all outbound traffic, but only inbound from the default SG)
  • Default subnets (4 subnets with a route to an internet gateway and a single ACL rule allowing all traffic)

The job definition has an execution role with the managed policy AmazonECSTaskExecutionRolePolicy and has the "Public IP" option disabled.

The network configuration seems to be enough to pull images from the internet, but I'm still getting the timeout error. Also, the IAM Role seems to have the relevant policies to authenticate with my private ECR. Can someone help me debug this?

ianliu
已提問 2 年前檢視次數 175 次
1 個回答
0

I ran into the same problem when working on AWS Financial Industry Quest: Grid computing for capital markets. The research told me to check network connections and VPC endpoints, but those should not be a problem when working on AWS built and managed console. SO weird.

SST
已回答 4 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南