Why can't I connect to my Amazon EC2 Linux instance when the health status checks pass?

6 minute read
0

I can't connect to my Amazon Elastic Compute Cloud (Amazon EC2) Linux instance even though the health status checks pass.

Resolution

Note: It's a best practice to maintain backups of your instances and data. Before you troubleshoot or make changes, create an AMI or create snapshots of your EBS volumes.

"Connection timeout" error

To troubleshoot the "ssh: connect to host <hostname> port 22: Connection timed out" error, check for the following issues:

Security group, routing, or network ACL misconfigurations
Check whether the security group inbound rule allows port 22 from your source server. Check that the outbound security group rule allows traffic to 0.0.0.0/0. For more information on security group rules, see rules to connect to instances from your computer.

To connect from an on-premises client to a private instance, check that there's a VPN connection between your local network and the Amazon Virtual Private Cloud (VPC). Or, connect to the instance from another jump server within the same VPC or subnet.

If you can connect, then the issue might be with your on-premises client and the AWS Site-to-Site VPN connection.

If you have a strict network access control list (network ACL) setup, then check that the ingress rule allows port 22. Check that the egress rule allows all traffic to 0.0.0.0/0.

Local firewall settings
Check for an OS-level firewall such as iptables or firewalld. To troubleshoot a blocked connection, review your firewall configuration.

OOM issues related to SSH
Check whether your instance has an OOM issue. If there's an OOM error, then the operating system (OS) is unresponsive or severely degraded and network requests time out. To troubleshoot OOM issues related to SSH, check the system logs and resource usage.

Check system logs:

Access system logs through AWS Systems Manager or EC2 Serial Console (if SSH is unavailable).

To check OOM messages, run the following command:

dmesg | grep -i oom

To review the Amazon Linux, RHEL, or CentOS system log, run the following command:

sudo less /var/log/messages

To review the Ubuntu or Debian system log, run the following command:

sudo less /var/log/syslog 

You might receive output that's similar to the following example:

Aug 17 10:00:01 ip-172-31-1-1 kernel: [123456.789012] Out of memory: Kill process 1234 (myprocess) score 950 or sacrifice child
Aug 17 10:00:01 ip-172-31-1-1 kernel: [123456.789013] Killed process 1234 (myprocess) total-vm:500000kB, anon-rss:200000kB, file-rss:50000kB
Aug 17 10:00:01 ip-172-31-1-1 kernel: [123456.789014] oom_reaper: reaped process 1234 (myprocess), now anon-rss:0kB, file-rss:0kB
Aug 17 10:00:01 ip-172-31-1-1 kernel: [123456.789015] OOM killer disabled.

Monitor resource usage:
To monitor the resource use of your system, choose from the following commands:

To check memory use, run the following command:

 free -m

To check processes, run the following command:

 top

Then, note the processes that consume most of the system resources such as memory or CPU.

To check swap use, run the following command:

 sudo swapon --show

Check historical resource use with the sar command.
If the sysstat package isn't installed, first run the following command to install it:

 sudo yum install sysstat

To view historical memory use, run the following command:

 sar -r

To view historical CPU use, run the following command:

 sar -u

To identify the processes that consume memory, run the following command:

ps -eo pmem,pid,user,args | sort -k 1 -r | head -10

To check the processes that consume the most CPUs, run the following command:

ps -eo pid,user,ppid,cmd,%mem,%cpu --sort=-%cpu | head

Check whether the root volume use is 100%:

df -Th

If you expected higher resource use but you don't see it, then upgrade your instance type. For more information, see monitor performance with System Activity Reporter (SAR) or configure monitoring tools.

If the serial console isn't configured for the instance, then see EC2 Serial Console Access to activate it.

If you can't use the serial console, then use a rescue instance to investigate system logs. See steps 1-8 to troubleshoot OS-level issues, and then check /var/log/messages or /varlog/syslog.

"Connection refused" error

If you receive the "Connection refused" error, then connect to the instance through the serial console to troubleshoot the issue.

"Host key verification failed" error

If you receive the "Host key verification failed" error, then the SSH client detected a mismatch between the server host key and the previously stored key. This situation occurs when the server's host key changed because of server reinstallation or a security issue.

Remove the old host key
To fix the verification error, remove the outdated or incorrect host key from your known_hosts file.
The file is located at ~/.ssh/known_hosts on Unix-based systems (Linux, macOS) or C:\Users\YourUsername\.ssh\known_hosts on Windows.

To delete the old host key entry that's associated with the specified hostname or IP address from the known_hosts file, run the following command:

 ssh-keygen -R your hostname or IP address

Note: Replace YourUsername with your username. Replace your hostname or IP address with the address of the server that you want to connect to.

Reconnect to the server
After you remove the old host key, use SSH to reconnect to the server. Verify the new host key fingerprint. Accept the new host key to add it to your known_hosts file.

"Permission Denied (publickey)" error

If you receive a "Permission denied (publickey)" error, then the SSH client can't use the provided credentials to authenticate with the server. To troubleshoot the error, make sure that the private key and the username are correct for the instance.

Session Manager automation to check common SSH issues

If the instance has Session Manager configured, then run the AWSSupport-TroubleshootSSH automation runbook. The runbook installs the Amazon EC2Rescue tool for Linux. Then, the runbook uses the tool to check or fix common issues that prevent a remote connection to the Linux machine through SSH.

Related information

How do I troubleshoot an EC2 Linux instance that failed the instance status check due to operating system issues?

Why is my EC2 Linux instance unreachable and failing its status checks?

Why can't I use Session Manager to connect to my Amazon EC2 instance?

AWS OFFICIAL
AWS OFFICIALUpdated a month ago