ResourceInitializationError when running a job in AWS Batch

0

I've created a docker image, pushed it into a private ECR Repository, and configured an AWS Batch cluster/queue/job definition. When I submit a job, it immediately goes to the STARTING state, and then fails with

ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval
failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s):
RequestError: send request failed caused by: Post https://api.ecr.us-west-2.amazonaws.com/: dial
tcp 54.240.255.116:443: i/o timeout

This seems to be a problem with the container image not being pulled. My cluster has the following specs:

  • Fargate provision model
  • Default VPC
  • Default security group (allows all outbound traffic, but only inbound from the default SG)
  • Default subnets (4 subnets with a route to an internet gateway and a single ACL rule allowing all traffic)

The job definition has an execution role with the managed policy AmazonECSTaskExecutionRolePolicy and has the "Public IP" option disabled.

The network configuration seems to be enough to pull images from the internet, but I'm still getting the timeout error. Also, the IAM Role seems to have the relevant policies to authenticate with my private ECR. Can someone help me debug this?

ianliu
gefragt vor 2 Jahren175 Aufrufe
1 Antwort
0

I ran into the same problem when working on AWS Financial Industry Quest: Grid computing for capital markets. The research told me to check network connections and VPC endpoints, but those should not be a problem when working on AWS built and managed console. SO weird.

SST
beantwortet vor 4 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen