ec2 server instances unresponsive

0

On two successive days I found that the same two server instances (both of them t4g.small) became unresponsive overnight, and needed to be Stopped and Started using the Amazon Console. I believe this Stop/Start of an instance leads to the instance being instantiated on a different underlying server hardware. The server logs are not accessible when the instances become unresponsive - so I do not have any idea why the instances are hanging; but I fear there may be an underlying hardware issue.

How can I handle this problem? What could be the reasons for the same two instances hanging repeatedly? The message on the Instances view of the Management Console was "1/2 Checks passed" in red. Is there a way that a server instance can be automatically Stopped and then Started if a Health check remains in a failed state for a certain length of time?

질문됨 2년 전468회 조회
2개 답변
0

Thanks for your response. I have now created CloudWatch Alarms for each of the failing instances and selected Reboot as the associated Action (because there was no option to Stop and then Start an instance). But I am not sure that Reboot will work when the instance freezes. I have also had to associate Elastic IP addresses with these two instances because the Public IP address of each instance changes whenever I Stop and then Start an instance. Is it normal for the Public IP address of an instance to be changed like this after Stop/ Start, or is this peculiar to specific Availabilty Zones/ Regions?

답변함 2년 전
  • Public IP addresses associated with an instance that are not Elastic IPs are recycled when the instance stops. If you want a persistent public IP address to be associated with an instance, you must create an EIP and associate it with the instance's network interface after starting it.

    In general, I recommend using EC2 Auto Scaling instead of CloudWatch alarms to manage instance health. You can create an Auto Scaling Group with a fixed size (for example, 2 in your case) and the service will automatically terminate and relaunch failed instances.

0

Unfortunately, from time to time, an EC2 instance will fail - and sometimes more than one at once, if there is an incident that impacts an entire rack of servers or more.

There are multiple strategies for performing automated recovery of failed EC2 instances that are discussed in our documentation. Simplified automatic recovery works in many - but not all - circumstances. We recommend configuring CloudWatch Alarms to detect and recover when system status checks fail.

If you have not terminated the failed instance - or if you have terminated it, but opted to preserve the EBS volumes associated with it when you created it - you may be able to locate the original EBS volume and attach it to an instance to examine the logs for troubleshooting purposes.

AWS
전문가
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠