Failure to get an IP address?

0

We've had an AWS EC2 instance in us-west running Ubuntu 24/7 for 6 or 7 years now. After some early troubles, I have the VM do a superstitious reboot at 9 PM every Friday evening. After last night's reboot, the instance did not come back up. Well, that's not exactly correct; the instance came up, but it was unreachable. A reboot (several hours later) did not help; I finally had to do a "forced stop" and "start", and then it came back.

Based on the logs, it looks like DHCP failed to assign me an internal IP address. Without that, I can't be reached, of course. I don't know how to chase this any further. I couldn't find any record of an outage, and of course there's no one to ask. Is that all I can do?

asked a month ago261 views
2 Answers
1

You state: "and of course there's no one to ask. Is that all I can do?", asking here isn't going to get you a deterministic answer.

The correct path forward here is to open a support case with AWS support here, provide them with your instance ID and an approximate time you rebooted, and they will be able to tell you conclusively if there was a failure on the host that your instance was on at the time.

Bear in mind, that an EC2 instance is still a virtual machine sitting on a hypervisor on a physical machine, and from time to time all hardware may experience a failure, especially here, when you say the machine has been up for 7 odd years, so likely on the same host. You can read about AWS EC2 SLAs here: https://aws.amazon.com/compute/sla/. Note that for a single instance AWS has the following SLA:

For each individual Amazon EC2 instance (“Single EC2 Instance”), AWS will use commercially reasonable efforts to make the Single EC2 Instance available with an Instance-Level Uptime Percentage of at least 99.5%, in each case during any monthly billing cycle (the “Instance-Level SLA”).

Calling a stop and a start API call on the instance will usually cause EC2 to migrate it to a new host, which then causes the issue to go away.

AWS
EXPERT
answered a month ago
0

It doesn't sound like there was any problem on AWS's side. If they'd had a major outage with DHCP, it wouldn't have gone unnoticed by a lot of customers.

You said the initial restart didn't help. Did you check that Ubuntu actually restarted and didn't receive an IP? You said that you had to force a power-off, which strongly suggests that Ubuntu wasn't responding to the command that EC2 sent via ACPI to shut down gracefully. If Ubuntu wasn't even responding to a power-off signal (equivalent to the physical power button being pressed on a hardware machine), it must have been confused enough for it to be unlikely to have responded to a reboot signal either.

For a safety net, you could set a CloudWatch alarm to detect the instance health checks failing and trigger a reboot or auto-recovery (equivalent to a power-off followed by power-on) via the EC2 API. Instructions for the recover action: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingAlarmActions.html#AddingRecoverActions

If Ubuntu has general trouble obtaining a DHCP lease while starting up but is able to renew the lease while it's running, you could also consider simply configuring Ubuntu with a static IP instead of using DHCP. The private IP of the primary ENI remains unchanged for the lifetime of the EC2 instance.

EXPERT
Leo K
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions