AWS Fargate Bug - Failed to pull from ECR 3 timeouts

2

I keep getting the same issue randomly over and over. Sometimes containers update just fine... Other times this fails like 3 times and fargate stops trying to spin up the new instance....

ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.us-west-2.amazonaws.com/": dial tcp 52.119.171.233:443: i/o timeout

  • Do your subnets where the tasks are placed have an outbound connectivity to the internet? If not, are you using VPC endpoints to connect to the ECR service from your subnets?

  • Yes they do. Otherwise it would always fail.

  • @Venkat Penmetsa, looks like you were correct, the containers that failed were the ones, that tried running on the private subnet. And the ones that succeeded were the ones in the public subnet

asked 2 years ago12233 views
1 Answer
1

Hello rePost-User-9949458,

Thank you for providing answers to the my questions in the comments. I'd like to summarize the issue for someone dealing with the same problem.

By default, when a new ECS task is launched, the subnet where the task is placed should have an outbound connectivity to the internet because the ECS task would need to connect to the Amazon ECR service via internet.

This won't be a concern for public subnets as they'd have connectivity to the internet. However, for private subnets, you will either need to setup a NAT gateway to allow your subnets to talk to ECR through the internet, or you will have to setup ECR VPC Endpoints.

ECR VPC Endpoints option is more secure as this approach would allow your tasks to directly talk to the ECR service through the Amazon network (your traffic will not traverse through the internet).

Thank you!

profile pictureAWS
SUPPORT ENGINEER
answered 2 years ago
profile pictureAWS
EXPERT
reviewed 2 years ago
  • I am facing same issue while deploying a service on AWS Fargate, and I would greatly appreciate some guidance from the community. i have tried the solutions on StackOverflow on how to resolve it from assigning public ip address to my task and setting the inbound and outbound rules of my security group. i have tried 80% of the answers no solutions yet.

    To provide context, here's an overview of my AWS architecture:

    I have a VPC with three public subnets and two private subnets. The service I'm trying to deploy (let's call it the "email service") is intended to run in one of the public subnets. The VPC is connected to the internet via an internet gateway. I've configured security group rules both for inbound and outbound traffic to allow necessary protocols like TCP, HTTP, HTTPS, and SMTP with their respective ports and destinations. I've ensured that my IAM role is correctly configured with the appropriate permissions. Despite these efforts, I'm encountering this error when trying to deploy the service. I've also seen suggestions from other users on Stackoverflow, such as assigning a public IP to the task and setting outbound rules in the security group, but these solutions haven't resolved the issue for me.

    Additional information:

    The Fargate platform version I'm using is 1.4.0

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions