AWS Fargate Bug - Failed to pull from ECR 3 timeouts

2

I keep getting the same issue randomly over and over. Sometimes containers update just fine... Other times this fails like 3 times and fargate stops trying to spin up the new instance....

ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.us-west-2.amazonaws.com/": dial tcp 52.119.171.233:443: i/o timeout

  • Do your subnets where the tasks are placed have an outbound connectivity to the internet? If not, are you using VPC endpoints to connect to the ECR service from your subnets?

  • Yes they do. Otherwise it would always fail.

  • @Venkat Penmetsa, looks like you were correct, the containers that failed were the ones, that tried running on the private subnet. And the ones that succeeded were the ones in the public subnet

1개 답변
1

Hello rePost-User-9949458,

Thank you for providing answers to the my questions in the comments. I'd like to summarize the issue for someone dealing with the same problem.

By default, when a new ECS task is launched, the subnet where the task is placed should have an outbound connectivity to the internet because the ECS task would need to connect to the Amazon ECR service via internet.

This won't be a concern for public subnets as they'd have connectivity to the internet. However, for private subnets, you will either need to setup a NAT gateway to allow your subnets to talk to ECR through the internet, or you will have to setup ECR VPC Endpoints.

ECR VPC Endpoints option is more secure as this approach would allow your tasks to directly talk to the ECR service through the Amazon network (your traffic will not traverse through the internet).

Thank you!

profile pictureAWS
지원 엔지니어
답변함 2년 전
profile pictureAWS
전문가
검토됨 2년 전
  • I am facing same issue while deploying a service on AWS Fargate, and I would greatly appreciate some guidance from the community. i have tried the solutions on StackOverflow on how to resolve it from assigning public ip address to my task and setting the inbound and outbound rules of my security group. i have tried 80% of the answers no solutions yet.

    To provide context, here's an overview of my AWS architecture:

    I have a VPC with three public subnets and two private subnets. The service I'm trying to deploy (let's call it the "email service") is intended to run in one of the public subnets. The VPC is connected to the internet via an internet gateway. I've configured security group rules both for inbound and outbound traffic to allow necessary protocols like TCP, HTTP, HTTPS, and SMTP with their respective ports and destinations. I've ensured that my IAM role is correctly configured with the appropriate permissions. Despite these efforts, I'm encountering this error when trying to deploy the service. I've also seen suggestions from other users on Stackoverflow, such as assigning a public IP to the task and setting outbound rules in the security group, but these solutions haven't resolved the issue for me.

    Additional information:

    The Fargate platform version I'm using is 1.4.0

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인