AWS EC2 servers crashed due to unknown reason. Status check failed and doesnt recover after reboot

0

Last month we experienced a few unexpected behaviors from the AWS ec2 servers and light sail services.

From Dec 13th to 14th, one of our AWS ec2 server's CPU utilization shot up and later it's both status checks failed due to reasons that are not known to us and we are unable to figure it out.

After that, on Dec 20th one of our other AWS ec2 servers went unresponsive and all the status checks failed. We were unable to reboot the server and recover it to its normal stage for a considerably long period of time. The server was not responding to the reboot process. We are unable to figure out the reason behind the server outage. Even the Cloudwatch metrics show a break when servers are down. Attached is the image of the same.

Cloudwatch Screenshot

Recently on Dec 30th, our light sail servers also crashed and the reason is not known to us.

We are unable to figure out what exactly the problem is and we would like to know if any internal malfunctions happened in the servers.

  • This is tagged with Amazon Lightsail, but this looks like an EC2 instance. Please remove the Lightsail tag if it's not about that service

1개 답변
0

Hi,

If your instances fail 2/2 Status checks this generally means that the underlying host is impaired, This will in turn cause a degraded experience as you mention We were unable to reboot the server and recover it to its normal stage for a considerably long period of time. the correct cause of action once you note a 2/2 status check failure is to STOP/START your instance. This action will then move your instance to a different and healthy underlying host. See the link [1] that explains the concept of status checks in a bit more detail.

Let me know if there is anything specific you require assistance with, will be more than happy to offer guidance.

[1] Status checks for your instances - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html

profile pictureAWS
지원 엔지니어
Lundi
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠