All Ports Appear to be blocked on all resources

0

Hi

A strange issue for me that I have never encountered before.

All my resources are in eu-west-1.

Since 25th July, my site has been down. The load balancer says that the health check on the ec2 instance is failing and the ASG went in to a perpetual cycle of terminating and starting a new instance.

I suspended a couple of process to exit this perpetual cycle, tried to SSH as I normally do and the SSH connection timed out.

I checked firewall rules and SSH access is allowed. The ec2 instances are in a VPC. This VPC has a subnet and the route table for this subnet has a route to a gateway with the correct destination. It was all working before the 25th, I'm the only person with access to this AWS account and I made no changes.

Then today, I tried remotely connecting to the RDS and that failed too.

Connecting to the instances, even new ones, via the AWS Console also fails.

I've checked and all billing is up-to-date. I have even checked with AWS Billing Support and they have confirmed all bills are up to date.

I have tested with a new server in a new region (eu-west-2) and the SSH connection to that timed out too.

I have then tested my SSH connection to a server I have in another (unrelated) AWS account and I can connect to that perfectly fine.

So essentially, it looks like all ports are blocked on all resources and that is also why the load balancer is failing the health checks but the firewalls rules don't reflect this and no changes have been made.

Any ideas?

1 Answer
1

You have different options, but what you describe sounds like something is not well configured. Routes, security groups, NACLS, or any other thing. So, review it carefully as it's not difficult. You can use tools like Network Access Analyzer to see where your flows go through: https://docs.aws.amazon.com/vpc/latest/network-access-analyzer/what-is-network-access-analyzer.html

Best.

profile pictureAWS
answered 9 months ago
  • I don't think it's something not well configured as everything was working fine until 25th July and stopped without any intervention from anyone.

    The fact that I also set up a new ec2 instance in a new region with a new security group with port 22 open to the world and still couldn't SSH, it would suggest something else is amiss here.

  • It does sound like either a NACL or Route Table issue.

    Check CloudTrail for changes that may have happened on/around 7/25. Check IAM and look for new users/API keys and ensure that your account has not been compromised.

  • Really stumped on this one.

    EC2 Instances are associated to subnet-12345. This subnet is then associated to vpc-12345. This VPC is then associated to route table rtb-12345 and this route table has a route to an internet gateway with the destination being 0.0.0.0/0 and marked as Active. I can't even load the system log from the console. I just get an empty console type window.

    The ACL's are also default - rule 100 to allow all traffic and rule * to deny for both inbound and outbound.

    The security group associated to the instances also has SSH traffic allowed from 0.0.0.0/0 whilst I diagnose the issue.

  • Further update: I set up a new server in eu-west-2 this morning. Never had any resources in this region before so essentially, starting from scratch. Exactly the same issue - Don't seem to have a route to any ports.

    I've then tried the same in us-east-1 - again, same MO as above in that I have never had any resources in that region so once more, starting from scratch. Strangely, everything works fine! I've then compared the security groups, subnet, VPC, ACL and RTB between us-east-1 and eu-west-1 & eu-west-2; they are 100% identical.

    Something really strange happening here.

  • Another update if anyone is able to shed any light on this at all.

    I have isolated that the issue is with the AMI but don't understand why or how. If I launch a brand new instance in any region and add it to the existing security group, the same subnet, same VPC and subsequently, the same RTB and NACL's, it works fine but as soon as I launch an instance using my AMI, it fails. But, this AMI was created and has been in use since April 2023. I have tried launching a number of different instance types across difference instance classes in case something has changed and the instance type I had previously cannot handle it (I previously used c4.2XL) but i still have the same issue. I cannot get any system logs either from the console. I just see a blank terminal window. I tried a Nitro server as well in an attempt to log in using the EC2 Serial Console but that also just produces a blank terminal window. This seems to point to the server becoming unresponsive as soon as it launches using an AMI from April that has been working fine until about 10 days ago. Any help is much appreciated.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions