Why did my Amazon EC2 instance fail before the instance degradation deadline?
3 minute read
My Amazon Elastic Compute Cloud instance (EC2) is scheduled for an instance degradation, but it’s failing status checks before the deadline. Or I can’t connect to my EC2 instance before it’s instance degradation deadline.
When AWS detects an irreparable failure in the underlying hardware that hosts an EC2 instance, AWS schedules the instance for retirement. When the instance reaches its scheduled retirement date, AWS either stops or terminates the instance based on the type of root device. For more information, see Instance retirement and Scheduled events.
But EC2 instances can fail and become unreachable before this deadline because of hardware failure on the underlying host. Although AWS monitors the underlying hardware that hosts instances and schedules retirement for irreparable issues, failures can occur without prior indication.
Note: If your instance is scheduled for retirement, then take action as soon as possible because the instance might become unreachable before the scheduled date.
When an EC2 instance is scheduled for retirement, AWS sends an email to the email address that's associated with your account. The email provides details about the scheduled event, and includes its start and end dates. Depending on the type of event, you might be able to take action before this date.
Amazon EC2 resources are Region-specific. If AWS notifies you of an instance degradation event in the AWS Health Dashboard in a Region, then use the following steps:
Check if your instance has an Amazon Elastic Block Store (Amazon EBS) volume as its root device. If so, then stop and start your EC2 instance to complete the event. The instance then migrates to a new underlying host.
Important: If you stop and restart an instance, then the public IP address of your instance also changes. It's a best practice to use an Elastic IP address instead of a public IP address when you route external traffic to your instance.
Note: The Force stop instance option is available in the Amazon EC2 console only when your instance is in the stopping state. If your instance is in another state (except shutting down and terminated), then use the AWS CLI to force stop your instance.
To avoid unexpected downtime caused by underlying failure, use the following steps:
Create an Amazon CloudWatch alarm that monitors your EC2 instances. Configure the alarm to automatically recover the instance if its impaired by an underlying hardware failure.
Your EC2 instance might be based on an instance store or an ephemeral root volume. In this case, you might face issues if your instance is scheduled to stop instead of reboot. It's a best practice to launch replacement instances from your most recent AMI. Then migrate all necessary data to the replacement instance before the instance is scheduled to terminate. Then terminate the original instance or wait for it to terminate as scheduled.
If the EC2 instance fails and is then recovered or restarted successfully before the deadline, then the instance degradation event is complete. The instance has already migrated to new healthy hardware, and no further action is needed.