Unexpected stop on instance

0

Hello

One of our instances stopped during the night and then started up again over an hour later. I can see nothing in the dashboard to say there is/was an issue around this time in this region. Nor can I see anything in the CloudTrail logs.

Why did the instance stop suddenly and then start again over an hour later?

Instance: i-08763854cb703ee17
Stopped: 00:06
Started: 01:15
Region: eu-west-2

asked 5 years ago328 views
1 Answer
0
Accepted Answer

Hello,

I am sorry to hear about the issue with your instance i-08763854cb703ee17.

I have checked the instance and I could see that the underlying physical host, on top of which your instance was hosted, had been experiencing hardware related issues from 2019-03-14T00:09:00.000Z till 2019-03-14T01:16:00.000Z. This caused your instance to restart and to fail its status checks.

Please note that in the future you can check whether an instance was affected by a hardware related event by checking its 'System Status Checks' [1]. The history of these checks can also be viewed in Amazon CloudWatch by looking at StatusCheckFailed_System metric \[2,3].

Please accept our apologies for the above issue and for any inconvenience caused by it.

Please note that your instance is still being hosted on the same physical host. Although the host seem to be healthy right now, you may consider stopping and then starting your instance. As you may be aware already, in most cases, the stop / start action has the function to move an instance to another healthy physical host [4] (note: simple 'Reboot' action does not work this way) that was not affected by the above mentioned hardware issues.

I would like to suggest that you to take a look at the Auto Recovery feature for Amazon EC2. You can create an Amazon CloudWatch alarm that monitors an Amazon EC2 instance and automatically recovers the instance if it becomes impaired due to an underlying hardware failure or a problem that requires AWS involvement to repair. Basically, you can use CloudWatch to set up the alarm which will trigger when the System Status check fails. This alarm can further trigger an EC2 Action like "Recover this instance" \[5,6].

Please let us know if you need any further help.

Links:
[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html#types-of-instance-status-checks
[2] https://aws.amazon.com/blogs/aws/ec2-instance-status-metrics/
[3] https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ec2-metricscollected.html#ec2-metrics
[4] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html#instance_stop
[5] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/UsingAlarmActions.html#AddingRecoverActions
[6] https://aws.amazon.com/blogs/aws/new-auto-recovery-for-amazon-ec2/

Regards,
awstomas

AWS
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions