ec2 server instances unresponsive

0

On two successive days I found that the same two server instances (both of them t4g.small) became unresponsive overnight, and needed to be Stopped and Started using the Amazon Console. I believe this Stop/Start of an instance leads to the instance being instantiated on a different underlying server hardware. The server logs are not accessible when the instances become unresponsive - so I do not have any idea why the instances are hanging; but I fear there may be an underlying hardware issue.

How can I handle this problem? What could be the reasons for the same two instances hanging repeatedly? The message on the Instances view of the Management Console was "1/2 Checks passed" in red. Is there a way that a server instance can be automatically Stopped and then Started if a Health check remains in a failed state for a certain length of time?

posta 2 anni fa468 visualizzazioni
2 Risposte
0

Thanks for your response. I have now created CloudWatch Alarms for each of the failing instances and selected Reboot as the associated Action (because there was no option to Stop and then Start an instance). But I am not sure that Reboot will work when the instance freezes. I have also had to associate Elastic IP addresses with these two instances because the Public IP address of each instance changes whenever I Stop and then Start an instance. Is it normal for the Public IP address of an instance to be changed like this after Stop/ Start, or is this peculiar to specific Availabilty Zones/ Regions?

con risposta 2 anni fa
  • Public IP addresses associated with an instance that are not Elastic IPs are recycled when the instance stops. If you want a persistent public IP address to be associated with an instance, you must create an EIP and associate it with the instance's network interface after starting it.

    In general, I recommend using EC2 Auto Scaling instead of CloudWatch alarms to manage instance health. You can create an Auto Scaling Group with a fixed size (for example, 2 in your case) and the service will automatically terminate and relaunch failed instances.

0

Unfortunately, from time to time, an EC2 instance will fail - and sometimes more than one at once, if there is an incident that impacts an entire rack of servers or more.

There are multiple strategies for performing automated recovery of failed EC2 instances that are discussed in our documentation. Simplified automatic recovery works in many - but not all - circumstances. We recommend configuring CloudWatch Alarms to detect and recover when system status checks fail.

If you have not terminated the failed instance - or if you have terminated it, but opted to preserve the EBS volumes associated with it when you created it - you may be able to locate the original EBS volume and attach it to an instance to examine the logs for troubleshooting purposes.

AWS
ESPERTO
con risposta 2 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande