AWS EC2 servers crashed due to unknown reason. Status check failed and doesnt recover after reboot

0

Last month we experienced a few unexpected behaviors from the AWS ec2 servers and light sail services.

From Dec 13th to 14th, one of our AWS ec2 server's CPU utilization shot up and later it's both status checks failed due to reasons that are not known to us and we are unable to figure it out.

After that, on Dec 20th one of our other AWS ec2 servers went unresponsive and all the status checks failed. We were unable to reboot the server and recover it to its normal stage for a considerably long period of time. The server was not responding to the reboot process. We are unable to figure out the reason behind the server outage. Even the Cloudwatch metrics show a break when servers are down. Attached is the image of the same.

Cloudwatch Screenshot

Recently on Dec 30th, our light sail servers also crashed and the reason is not known to us.

We are unable to figure out what exactly the problem is and we would like to know if any internal malfunctions happened in the servers.

  • This is tagged with Amazon Lightsail, but this looks like an EC2 instance. Please remove the Lightsail tag if it's not about that service

1 Answer
0

Hi,

If your instances fail 2/2 Status checks this generally means that the underlying host is impaired, This will in turn cause a degraded experience as you mention We were unable to reboot the server and recover it to its normal stage for a considerably long period of time. the correct cause of action once you note a 2/2 status check failure is to STOP/START your instance. This action will then move your instance to a different and healthy underlying host. See the link [1] that explains the concept of status checks in a bit more detail.

Let me know if there is anything specific you require assistance with, will be more than happy to offer guidance.

[1] Status checks for your instances - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html

profile pictureAWS
SUPPORT ENGINEER
Lundi
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions