EC2 instance suddenly unreachable via SSH


I was working on one of many instances (i-0023be12dc6bc88dd) in us-east-1a yesterday when the SSH session stopped responding. Attempting to reconnect timed out. This has happened occasionally before on other instances after a large spike/load of network traffic and usually recovers with an instance restart. This did not work in this case, and all others are unable to reach it as well.

tried so far:

  1. Instance restart
  2. Instance stop-start
  3. remove-readd security groups
  4. reset my local VPN connection, we have a (VPN/route table to reach VPC instances)
  5. checked the flow logs of the ENI, does not show traffic from my internal VPN IP during new attempts
  6. iptables -F && systemctl restart sshd

What works:

  1. If I SSH into another instance in the VPC (same or different subnet), I can then SSH into the problem instance immediately, everything is running and it behaves normally.


~$ ssh -v -i mykey.pem ubuntu@
OpenSSH_7.2p2 Ubuntu-4ubuntu2.10, OpenSSL 1.0.2g  1 Mar 2016
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: Connecting to [] port 22.
debug1: connect to address port 22: Connection timed out
ssh: connect to host port 22: Connection timed out

From the instance when connected through another instance:

ubuntu@ip-172-31-128-87:~$ sudo systemctl restart sshd
ubuntu@ip-172-31-128-87:~$ sudo ss -tpln | grep -E '22|ssh'
LISTEN   0         128            *        users:(("sshd",pid=4467,fd=3))         
LISTEN   0         128                    [::]:22                  [::]:*        users:(("sshd",pid=4467,fd=4))

I'm at a loss for what's next.

asked 2 years ago1914 views
2 Answers
Accepted Answer

So I figured out what was happening.... This was a dev-test box where we were testing various config stacks with docker-compose and a named networks: section. Each time a docker-compose up -d was executed, compose was recreating the network and incrementing the CIDR block starting from the default Once it got to after 10 restarts, it created a bridge interface that sat on top of the CIDR for our VPN routes.

ubuntu@ip-172-31-128-87:~$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface         UG        0 0          0 ens5     U         0 0          0 docker0     U         0 0          0 br-61dfa3cb04db     U         0 0          0 br-889068f61237   U         0 0          0 ens5

This is why it was only accesible via another instance in the Even after and instance reboot, docker-compose held the network bridge config even though the containers had crashed. Another docker-compose down cleaned them up then we were able to pin the network creation to a non-conflicting CIDR in docker-compose.yml

      driver: default
        - subnet:
answered 2 years ago

From what you are explaining here, the right path should be to open a support ticket as the support engineers have the right tools to analyze the situation and help you to diagnose what's happening.


profile pictureAWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions