EC2 Instance going unresponsive repeatedly

0

Hi,

we have new EC2 instance we want to use for some self-hosting. The instance became unresponsive the first time within a few hours of the setup while I was installing something. The SSH session broke down and when I tried to reconnect it took forever but also never timed out. Now this sounds to me like resource exhaustion so I checked: The syslogs never indicated any of the typical warnings (CPU, Memory or Disk). There was also a big gap in the syslogs where there was nothing reported. And the dashboard also never showed a CPU utilization above 62% (and this is a pretty big instance). At some point after 1-2 hours the status check also started to show that reachability failed.

We rebooted the instance and it started to work fine again. I was able to finish the install process and started to continue the further setup through the web interface. After changing and saving some standard settings the instance again became unresponsive. First, the web interface didn't load and then I also couldn't connect via SSH anymore, same as before. The CPU utilization and other stats show almost no utilization since the happened again last night. The reachability status is also unchanged.

What could be the cause of this and what should be our course of action? It sounds to me like this could be hardware related. At least I don't have any other ideas what could lead to this behavior.

  • Could you be more specific about "pretty big instance". What instance type and size is it?

fjahr
asked 8 months ago198 views
2 Answers
0

If a hardware issue was affecting availability of the EC2 service then it should also be noted here https://health.aws.amazon.com/health/home#/account/dashboard/open-issues In which region is your instance?

Is this just a normal on-demand instance that you've spun up, not a spot instance that could have been shutdown by AWS at short notice?

It sounds like resource exhaustion issue, and would be worthwhile setting up CloudWatch agent to collect more detailed system logs https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html

Appreciate the following may be time-consuming, but as an exercise (to see if this is repeatable) would you be able to install CloudWatch agent on a fresh EC2 instance and then go through the software installation & setup that you were doing? This may point to where the bottleneck is.

profile picture
EXPERT
Steve_M
answered 8 months ago
0

have you seen the ec2 status from Management Console during connection issue happen, and maybe you could try SSM and same thing happen on session.

fyi, if something under-lay (base infra level/lower than OS) outage happen AWS usually let you know about this outage ahead or sometimes after.

V
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions