- Newest
- Most votes
- Most comments
You state: "and of course there's no one to ask. Is that all I can do?", asking here isn't going to get you a deterministic answer.
The correct path forward here is to open a support case with AWS support here, provide them with your instance ID and an approximate time you rebooted, and they will be able to tell you conclusively if there was a failure on the host that your instance was on at the time.
Bear in mind, that an EC2 instance is still a virtual machine sitting on a hypervisor on a physical machine, and from time to time all hardware may experience a failure, especially here, when you say the machine has been up for 7 odd years, so likely on the same host. You can read about AWS EC2 SLAs here: https://aws.amazon.com/compute/sla/. Note that for a single instance AWS has the following SLA:
For each individual Amazon EC2 instance (“Single EC2 Instance”), AWS will use commercially reasonable efforts to make the Single EC2 Instance available with an Instance-Level Uptime Percentage of at least 99.5%, in each case during any monthly billing cycle (the “Instance-Level SLA”).
Calling a stop and a start API call on the instance will usually cause EC2 to migrate it to a new host, which then causes the issue to go away.
It doesn't sound like there was any problem on AWS's side. If they'd had a major outage with DHCP, it wouldn't have gone unnoticed by a lot of customers.
You said the initial restart didn't help. Did you check that Ubuntu actually restarted and didn't receive an IP? You said that you had to force a power-off, which strongly suggests that Ubuntu wasn't responding to the command that EC2 sent via ACPI to shut down gracefully. If Ubuntu wasn't even responding to a power-off signal (equivalent to the physical power button being pressed on a hardware machine), it must have been confused enough for it to be unlikely to have responded to a reboot signal either.
For a safety net, you could set a CloudWatch alarm to detect the instance health checks failing and trigger a reboot or auto-recovery (equivalent to a power-off followed by power-on) via the EC2 API. Instructions for the recover action: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingAlarmActions.html#AddingRecoverActions
If Ubuntu has general trouble obtaining a DHCP lease while starting up but is able to renew the lease while it's running, you could also consider simply configuring Ubuntu with a static IP instead of using DHCP. The private IP of the primary ENI remains unchanged for the lifetime of the EC2 instance.
Relevant content
- asked 8 years ago
- asked 2 years ago
- asked a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago