Unexpected stop on instance

0

Hello

One of our instances stopped during the night and then started up again over an hour later. I can see nothing in the dashboard to say there is/was an issue around this time in this region. Nor can I see anything in the CloudTrail logs.

Why did the instance stop suddenly and then start again over an hour later?

Instance: i-08763854cb703ee17
Stopped: 00:06
Started: 01:15
Region: eu-west-2

gefragt vor 5 Jahren334 Aufrufe
1 Antwort
0
Akzeptierte Antwort

Hello,

I am sorry to hear about the issue with your instance i-08763854cb703ee17.

I have checked the instance and I could see that the underlying physical host, on top of which your instance was hosted, had been experiencing hardware related issues from 2019-03-14T00:09:00.000Z till 2019-03-14T01:16:00.000Z. This caused your instance to restart and to fail its status checks.

Please note that in the future you can check whether an instance was affected by a hardware related event by checking its 'System Status Checks' [1]. The history of these checks can also be viewed in Amazon CloudWatch by looking at StatusCheckFailed_System metric \[2,3].

Please accept our apologies for the above issue and for any inconvenience caused by it.

Please note that your instance is still being hosted on the same physical host. Although the host seem to be healthy right now, you may consider stopping and then starting your instance. As you may be aware already, in most cases, the stop / start action has the function to move an instance to another healthy physical host [4] (note: simple 'Reboot' action does not work this way) that was not affected by the above mentioned hardware issues.

I would like to suggest that you to take a look at the Auto Recovery feature for Amazon EC2. You can create an Amazon CloudWatch alarm that monitors an Amazon EC2 instance and automatically recovers the instance if it becomes impaired due to an underlying hardware failure or a problem that requires AWS involvement to repair. Basically, you can use CloudWatch to set up the alarm which will trigger when the System Status check fails. This alarm can further trigger an EC2 Action like "Recover this instance" \[5,6].

Please let us know if you need any further help.

Links:
[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html#types-of-instance-status-checks
[2] https://aws.amazon.com/blogs/aws/ec2-instance-status-metrics/
[3] https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ec2-metricscollected.html#ec2-metrics
[4] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html#instance_stop
[5] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/UsingAlarmActions.html#AddingRecoverActions
[6] https://aws.amazon.com/blogs/aws/new-auto-recovery-for-amazon-ec2/

Regards,
awstomas

AWS
beantwortet vor 5 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen