- Newest
- Most votes
- Most comments
The issue was the DHCP option set on the VPC - pointing to a local domain managed by a server that was no longer running.
Thanks everyone. Problem Solved.
How are you using NAT Gateway for the SSM in the first VPC where it works? SSM will require VPC endpoints in the same VPC where the instance is. Check pre-requisits: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/connect-to-an-amazon-ec2-instance-by-using-session-manager.html#connect-to-an-amazon-ec2-instance-by-using-session-manager-prereqs
A KB that always works for me: https://repost.aws/knowledge-center/ec2-systems-manager-vpc-endpoints
Were ther any Security Group or NACL rule changes? SSM needs outbound connection to the SSM public endpoint on port 443 if VPC endpoint is not used, this is for the agent to check in with SSM service.
No changes there. All of our resources are in their own subnet, and they aren't connecting to SSM. Both VPC's have a similar setup. Public and Private Subnets, with routes to the NAT Gateway from the private subnets.
Additionally, a server in the public subnet can't connect either. But I should be able to connect to that one. More to come on that.
I feel like I am solving a simple issue in public, just can't find it. Ain't that always the case?
Thanks all.
This is likely the issue:
2024-08-22 00:01:44 INFO [ssm-agent-worker] Entering SSM Agent hibernate - RequestError: send request failed caused by: Post "https://ssm.us-gov-west-1.amazonaws.com/": dial tcp: lookup ssm.us-gov-west-1.amazonaws.com on 127.0.0.53:53: read udp 127.0.0.1:54693->127.0.0.53:53: i/o timeout
DNS Resolution is enabled in both VPC's
Relevant content
- AWS OFFICIALUpdated 2 years ago
- published 2 years ago

Thanks for this. I am not using endpoints at all. The NAT Gateway is in a public subnet, and routes to any IP from the private subnet (there the instances are) are directed to the NAT Gateway. The architectures are the same between the two. Looking at Fleet Manager, all of the instance in the failing VPC have lost their connections, which is why I suspect something with the NAT Gateways in the failing VPC's.
Thanks again.