ec2 server instances unresponsive

0

On two successive days I found that the same two server instances (both of them t4g.small) became unresponsive overnight, and needed to be Stopped and Started using the Amazon Console. I believe this Stop/Start of an instance leads to the instance being instantiated on a different underlying server hardware. The server logs are not accessible when the instances become unresponsive - so I do not have any idea why the instances are hanging; but I fear there may be an underlying hardware issue.

How can I handle this problem? What could be the reasons for the same two instances hanging repeatedly? The message on the Instances view of the Management Console was "1/2 Checks passed" in red. Is there a way that a server instance can be automatically Stopped and then Started if a Health check remains in a failed state for a certain length of time?

asked 2 years ago461 views
2 Answers
0

Thanks for your response. I have now created CloudWatch Alarms for each of the failing instances and selected Reboot as the associated Action (because there was no option to Stop and then Start an instance). But I am not sure that Reboot will work when the instance freezes. I have also had to associate Elastic IP addresses with these two instances because the Public IP address of each instance changes whenever I Stop and then Start an instance. Is it normal for the Public IP address of an instance to be changed like this after Stop/ Start, or is this peculiar to specific Availabilty Zones/ Regions?

answered 2 years ago
  • Public IP addresses associated with an instance that are not Elastic IPs are recycled when the instance stops. If you want a persistent public IP address to be associated with an instance, you must create an EIP and associate it with the instance's network interface after starting it.

    In general, I recommend using EC2 Auto Scaling instead of CloudWatch alarms to manage instance health. You can create an Auto Scaling Group with a fixed size (for example, 2 in your case) and the service will automatically terminate and relaunch failed instances.

0

Unfortunately, from time to time, an EC2 instance will fail - and sometimes more than one at once, if there is an incident that impacts an entire rack of servers or more.

There are multiple strategies for performing automated recovery of failed EC2 instances that are discussed in our documentation. Simplified automatic recovery works in many - but not all - circumstances. We recommend configuring CloudWatch Alarms to detect and recover when system status checks fail.

If you have not terminated the failed instance - or if you have terminated it, but opted to preserve the EBS volumes associated with it when you created it - you may be able to locate the original EBS volume and attach it to an instance to examine the logs for troubleshooting purposes.

AWS
EXPERT
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions