Root cause for AWS EC2 disconnection.

0

Hi,

One of our production EC2 instances (Ubuntu 18.04 OS) became unreachable and we were unable to connect it via SSH. We rebooted the instance and then we were able to connect after that. Can someone point me to where should we check for the root cause for this problem? I could not find any useful info from the Instance's system logs as well.

已提问 2 年前241 查看次数
3 回答
0

Actually, it was almost at 0% utilization. This is the first time we faced this issue. Instance type is c5.2xlarge. We have also checked the disk usage. It was at 55%. Is there a possibility of some internal OS error on Ubuntu, that might have crashed it and made it unreachable? If yes, where could I find the logs?

已回答 2 年前
  • Server logs are saved in /var/log directory. You might want to check kern.log or syslog to rule out OS crash.

0

You should have a look at the Metrics for the instance. Especially the CPU Utilization, Status check failed metrics. See if there was high CPU utilization around the time you had to reboot instance. Depending on the instance type also look for CPU credit usage and balance.

Is it a recurring issue or one-off incident? What is the instance type?

profile picture
Syd
已回答 2 年前
0

Hello,

There is a possibility that this could be an operating system issue, such as a crash or kernel panic. Was there an error message when trying to connect?

I would suggest checking out this Ubuntu documentation on how to view your logs.

https://ubuntu.com/tutorials/viewing-and-monitoring-log-files#2-log-files-locations

Some specific logs you may find useful could include:

Authorization log = /var/log/auth.log - which keeps track of authorization systems, such as password prompts, the sudo command and remote logins.

Login failures log = /var/log/faillog - which contains info about login failures. You can view it with the faillog command.

Daemon Log = /var/log/daemon.log - daemons are programs that run in the background, usually without user interaction. For example, display server, SSH sessions, printing services, bluetooth, and more. This could indicate if there were issues with SSH.

System log = /var/log/syslog - Contains more information about your system. If you can’t find anything in the other logs, it’s probably here.

Also, here are some articles on some general EC2 troubleshooting tips:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-troubleshoot.html

https://docs.aws.amazon.com/en_us/AWSEC2/latest/UserGuide/TroubleshootingInstancesConnecting.html#TroubleshootingInstancesConnectionTimeout

已回答 7 个月前
AWS
支持工程师
已审核 7 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则