Unexpected stop on instance

0

Hello

One of our instances stopped during the night and then started up again over an hour later. I can see nothing in the dashboard to say there is/was an issue around this time in this region. Nor can I see anything in the CloudTrail logs.

Why did the instance stop suddenly and then start again over an hour later?

Instance: i-08763854cb703ee17
Stopped: 00:06
Started: 01:15
Region: eu-west-2

質問済み 5年前333ビュー
1回答
0
承認された回答

Hello,

I am sorry to hear about the issue with your instance i-08763854cb703ee17.

I have checked the instance and I could see that the underlying physical host, on top of which your instance was hosted, had been experiencing hardware related issues from 2019-03-14T00:09:00.000Z till 2019-03-14T01:16:00.000Z. This caused your instance to restart and to fail its status checks.

Please note that in the future you can check whether an instance was affected by a hardware related event by checking its 'System Status Checks' [1]. The history of these checks can also be viewed in Amazon CloudWatch by looking at StatusCheckFailed_System metric \[2,3].

Please accept our apologies for the above issue and for any inconvenience caused by it.

Please note that your instance is still being hosted on the same physical host. Although the host seem to be healthy right now, you may consider stopping and then starting your instance. As you may be aware already, in most cases, the stop / start action has the function to move an instance to another healthy physical host [4] (note: simple 'Reboot' action does not work this way) that was not affected by the above mentioned hardware issues.

I would like to suggest that you to take a look at the Auto Recovery feature for Amazon EC2. You can create an Amazon CloudWatch alarm that monitors an Amazon EC2 instance and automatically recovers the instance if it becomes impaired due to an underlying hardware failure or a problem that requires AWS involvement to repair. Basically, you can use CloudWatch to set up the alarm which will trigger when the System Status check fails. This alarm can further trigger an EC2 Action like "Recover this instance" \[5,6].

Please let us know if you need any further help.

Links:
[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html#types-of-instance-status-checks
[2] https://aws.amazon.com/blogs/aws/ec2-instance-status-metrics/
[3] https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ec2-metricscollected.html#ec2-metrics
[4] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html#instance_stop
[5] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/UsingAlarmActions.html#AddingRecoverActions
[6] https://aws.amazon.com/blogs/aws/new-auto-recovery-for-amazon-ec2/

Regards,
awstomas

AWS
回答済み 5年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ