I want to set up automatic recovery of an Amazon Elastic Compute Cloud (Amazon EC2) instance using Amazon CloudWatch.
Short description
If your instance fails a system status check, then you can use CloudWatch alarm actions to automatically recover your instance. The recover option is available for over 90% of deployed Amazon EC2 instances. However, the recover option works only for system check failures, not for instance status check failures. In addition, if you terminate your instance, then it can't be recovered.
If your instance fails a status check, then you might need to reboot the instance or change the configuration. For more information, see Types of status checks.
Resolution
Create an alarm
1. Open the Amazon EC2 console.
2. In the navigation pane, choose Instances.
3. Select the instance that you want to configure.
4. Choose Actions, and then choose Monitor and troubleshoot. Then, choose Manage CloudWatch alarms.
5. Choose Create an alarm.
Note: To create an alarm, you must have AWS Identity and Access Management (IAM) permissions to stop and start the associated instance. For more information, see Creating IAM roles.
6. For Alarm notification, choose an existing Amazon Simple Notification Service (Amazon SNS) topic. To create a new topic, see Creating an Amazon SNS topic. Note: To receive notifications when an alarm is triggered, you must be subscribed to the SNS topic.
7. Toggle on Alarm action, and then choose Recover.
8. For Group samples by and Type of data to sample, choose an appropriate statistic and metric for your use case.
9. For Consecutive period and Period, specify the evaluation period for the alarm.
10. (Optional) Modify the automatically created Alarm name.
11. Choose Create.
Set alarm for reboot
1. Open the CloudWatch console.
2. In the navigation pane, choose All Alarms.
3. Select the alarm that you created. Choose Action, and then choose Edit.
4. In the Additional Configuration section, select Treat missing data as bad (breaching threshold).
5. Choose Save.
Related information
Create alarms that stop, terminate, reboot, or recover an instance