Summary:
Recently my EKS cluster nodes stopped working. Both were t3.medium. I deleted the node group and created a new one which gave the issue "Node unable to join cluster" at the same time I was working on some standard EC2 things and had network issues. Both issues in EKS and in EC2 led to the same root problem; When I create a T2 instance of any size I am able to make outbound connections for about 5-7 minutes before the connections fail, think commands like:
curl -v http://www.google.com
When I create a T3 or C instance, the connections don't work right from the start. I have checked all the network settings in my VPC and SG. I have checked so many logs within the instances and honestly I'm lost. I made a support request 4 days ago also and they never answered. I think the most important part to find the cause of this issue is the difference between t2 and t3. I should also make a note that my security became compromised sometime before this happened.
Detailed Description:
Environment:
EC2 Instance types: T2 and T3, C5, i4i
Operating System: Linux 2, Linux 2023, Ubuntu, Windows
Region: us-east-2 (I tried us-east-1 and there's no issues, looking into why now)
Troubleshooting Performed:
- Checked logs using journalctl
- Stopped and checked dependencies for refresh-policy-routes@enX0.service
- Checked network routes and interfaces using ip route, ip a, etc.
- Reviewed NACLs, security groups, route tables, flow logs
- Checked for incoming requests from unrecognized IP addressed. I will say that in the flow logs there were hundreds of lines with addresses I was unfamiliar with making connections which were rejected.