[Solution Found] Private ECR is down. docker login fails

0

I have been getting constant timeouts since at least 8am GTM (8 hours offline so far) while trying to use private ECR. I have tried both us-east2 and us-east1.

aws ecr get-login-password --region <MY_REGION> | docker login --username AWS --password-stdin <MY_ID>.dkr.ecr.<MY_REGION>.amazonaws.com
Error response from daemon: Get "https://****.dkr.ecr.***.amazonaws.com/v2/": dial tcp: lookup ****.dkr.ecr.****.amazonaws.com on 1.1.1.1:53: read udp 192.168.107.2:42814->1.1.1.1:53: i/o timeout

No issues are shown at https://health.aws.amazon.com/health. The private repositories exist and are listed with URI following the pattern <my_user_id>.dkr.ecr.us-east-1.amazonaws.com/cdk-<id>-container-assets-<my_user_id>-us-east-1

I have tried using VPN, changed local DNS servers, and verified the hostname at https://check-host.net/check-ping. All of them returning timeout.

ping <my_user_id>.dkr.ecr.us-east-2.amazonaws.com
PING nlb1-e62e779dc8783879.elb.us-east-2.amazonaws.com (3.16.93.245): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
^C
--- nlb1-e62e779dc8783879.elb.us-east-2.amazonaws.com ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss

Update: New things I have tried:

  1. I have wiped out the whole account using aws-nuke before recreating the private ECR repository.
  2. I also tried creating new EC2 instances with a security group allowing for All ICMP traffic.
  3. Selected another region that I had never used before (ca-central-1) and created a new private ECR repository there.
  4. I tried pinging from the new EC2 instance besides pinging from my machine.
  5. I tried following the Push commands for ecr steps the connect to ECR

check-host.net

3 Answers
1

If you're experiencing problems with your AWS services and you've checked the AWS Health Dashboard and there are no known issues, the best course of action is to open a support ticket with AWS. The support engineer will be able to get limited access to your AWS account and help you look at the internal logs as well.

Here are the steps to open a support ticket with AWS:

  1. Navigate to the AWS Management Console.
  2. In the navigation bar on the upper-right, choose 'Support', and then 'Support Center'.
  3. Choose 'Create case'.
  4. Choose 'Technical support'.
  5. Provide the necessary information about the issue. For your case, you should specify that you're having trouble with the ECR service in the us-east2 and us-east1 regions, and explain what's happening in as much detail as possible. Attach any relevant screenshots if you can.
  6. Click 'Submit'.

By opening a support ticket, you're directly engaging with AWS Support, who will then be able to help you troubleshoot your issue. Make sure to mention that this service was working for you before, but now it isn't. This will help them pinpoint any changes that could have caused the issue. If the issue turns out to be on AWS's side, they will report it accordingly. If it turns out to be a problem with your own setup or configuration, they should be able to guide you on how to fix it.

Regarding NACL and ICMP:

  • NACL, or Network Access Control List, is a feature of VPC that acts as a firewall for controlling traffic in and out of one or more subnets.
  • ICMP, or Internet Control Message Protocol, is a supporting protocol in the Internet protocol suite used by network devices to send error messages indicating, for instance, that a requested service is not available or that a host or router could not be reached. ping uses the ICMP protocol, which is why it was mentioned in the previous response.

If the answer is helpful, please click "Accept Answer" and upvote it.

profile picture
EXPERT
answered 10 months ago
profile pictureAWS
EXPERT
iBehr
reviewed 10 months ago
  • When I try to open a support case it shows "Technical support is unavailable under the Basic Support plan".

    I tried following some guides to fix NACL and ICMP, but:

    • The EC2's "Security Groups" shows "No security groups found";
    • The VPC's "Network ACLs" shows "No network ACLs found in this Region";

    Am I checking it in the wrong places?

    I have run aws-nuke and cleared the whole account before trying new deployments, so I don't think it is any wrong setting on my side as the ping fails on a totally wiped account. Is this expected?

  • Hi Victor,

    I just added steps for adding security group to your instance, which would allow ICMP protocol and you should be able to ping the instance. Please try it out and let me know how it goes.

0
Accepted Answer

I have found a solution to this problem! Thank you all for the assistance!

The problem

It turns out that:

  1. The issue with docker login failing was not on AWS side, but on my Docker client side.
  2. The method I used to verify if the ECR repo was up or down was incorrect (do not try to ping it!)

I believe that one of the recent updates of colima (my Docker client) caused internal network issues. As a workaround, I have stopped using colima and switched to Docker Desktop instead.

Additionally, the lack of information on how ECR works led me and others trying to help to overlook the fact that we are not supposed to ping ECR nor modify Inbound rules.

How to verify that ECR is actually up

I gotta thank gpt4 for this one!

  1. Set ECR_ENDPOINT=$MY_USER_ID.dkr.ecr.$MY_REGION.amazonaws.com, then

  2. Verify DNS resolution:

nslookup $ECR_ENDPOINT

If the domain name is not resolving correctly, try using a different DNS server, such as Google's public DNS (8.8.8.8 and 8.8.4.4) or Cloudflare's public DNS (1.1.1.1 and 1.0.0.1).

  1. Test ECR access: Run the following command to check if you can access the ECR repository:
aws ecr describe-repositories --repository-names $REPOSITORY_NAME --region $MY_REGION
  1. Test connectivity to the ECR endpoint (instead of using ping):
curl -I https://$ECR_ENDPOINT/v2/

If you receive a 401 Unauthorized response, it means you have successfully connected to the ECR endpoint, but you need to authenticate using the get-login-password command.

If you still encounter a timeout or connectivity issue, check your local network settings, such as firewalls or proxies, that might be blocking the connection to the ECR endpoint. You may need to configure your network settings to allow access to the ECR endpoint or consult your network administrator for assistance.

  1. The docker login: If all tests above have passed but you still see errors when running aws ecr get-login-password --region $MY_REGION | docker login --username AWS --password-stdin $ECR_ENDPOINT, then the problem is most likely with your Docker client.

By following these steps, you can verify the ECR status and troubleshoot any issues related to your Docker client.

Victor
answered 10 months ago
  • Awesome, great to hear this. I was also wondering, why would we ping ECR, which I asked in my last comment too. :) Cheers.

0

Hi,

Was it working earlier, if not did you check the security group attached and NACL of subnet in which this instance exists. Most likely it's security group issue. Make sure you have allowed ICMP protocol for your machine(from where you are pinging).

Ping uses ICMP protocol. Add inbound rule for ICMP to any of the security group attached to <my_user_id>.dkr.ecr.us-east-2.amazonaws.com and then try it. It should work as long as there is no deny at NACL level.

I checked internally as well and there is no issue at service side.

Edit: How to add security group:

  1. Navigate to the AWS Management Console.

  2. In the search bar on the upper-left, choose 'ec2'

  3. Under Network & Security, select Security Groups

  4. Choose Create Security Group

  5. Give it a name and description, choose VPC in which your instance is

  6. Under Inbound rules, select type as All ICMP - IPv4, source as Anywhere - IPv4(only for testing purpose) or My IP. I've attached screenshot for your reference. additional rule with different protocol and source ips can be adjusted based on your usage. inbound rule for ICMP

  7. Leave Outbound rules as is.

  8. Click Create security group

  9. Now go to EC2 -> *Instances

  10. Select your instance, from upper-right Actions drop down, select Security and then add security group which you created above.

  11. Now ping from your own machine to this instance, you should be able to receive the ping response.

profile pictureAWS
EXPERT
answered 10 months ago
  • It was working before, but I haven't tried deploying for quite a while (a few weeks at least). I use SST for deployments and I haven't touched anything related to networking settings, so everything should be default settings. I don't even know what NACL and ICMP are 😟

  • Can you make sure if there was nothing changed since last working. When we connect to instance, if we are not allowing traffic to instance, then also, you'll see timeout error. When we ping an instance, it uses ICMP protocol, all I'm saying is, see if there was any change made to security groups attached to that instance. One of the attached security group should allow ICMP traffic for your machine(from where you are pinging).

    If you still think there is no change at your side, go to support console and log support case, support engineer would surely help you.

    But to confirm again, there is no issue at service side, that I can confirm.

  • Hi Victor,

    I just added steps for adding security group to your instance, which would allow ICMP protocol and you should be able to ping the instance. Please try it out and let me know how it goes.

  • I understand that you don't have access to create a case with AWS support. Also, I don't think you can ping container directly, were you able to ping it earlier either way. Please refer this guide to start from scratch and see if this helps.

  • Thanks for the detailed guide! As I mentioned in one of my comments, the account has been wiped out using aws-nuke and thus there is no VPC for me to choose from in step #5. The only thing that exists in the account is the private ECR repository (<my_user_id>.dkr.ecr.us-east-2.amazonaws.com/test-ecr).

    I went ahead regardlessly and created an EC2 instance just to follow your guide. Then I did the ping from my machine and it still fails to reach the ECR hostname.

    I then connected to the newly created EC2 instance and tried pinging from that machine. It also fails there:

    ec2-user@ip-172-31-3-126 ~]$ ping <my_user_id>.dkr.ecr.us-east-2.amazonaws.com
    PING nlb1-e62e779dc8783879.elb.us-east-2.amazonaws.com (3.16.93.245) 56(84) bytes of data.
    ^C
    --- nlb1-e62e779dc8783879.elb.us-east-2.amazonaws.com ping statistics ---
    251 packets transmitted, 0 received, 100% packet loss, time 259985ms
    

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions