Failure to get an IP address?

0

We've had an AWS EC2 instance in us-west running Ubuntu 24/7 for 6 or 7 years now. After some early troubles, I have the VM do a superstitious reboot at 9 PM every Friday evening. After last night's reboot, the instance did not come back up. Well, that's not exactly correct; the instance came up, but it was unreachable. A reboot (several hours later) did not help; I finally had to do a "forced stop" and "start", and then it came back.

Based on the logs, it looks like DHCP failed to assign me an internal IP address. Without that, I can't be reached, of course. I don't know how to chase this any further. I couldn't find any record of an outage, and of course there's no one to ask. Is that all I can do?

asked 10 months ago381 views
2 Answers
1

You state: "and of course there's no one to ask. Is that all I can do?", asking here isn't going to get you a deterministic answer.

The correct path forward here is to open a support case with AWS support here, provide them with your instance ID and an approximate time you rebooted, and they will be able to tell you conclusively if there was a failure on the host that your instance was on at the time.

Bear in mind, that an EC2 instance is still a virtual machine sitting on a hypervisor on a physical machine, and from time to time all hardware may experience a failure, especially here, when you say the machine has been up for 7 odd years, so likely on the same host. You can read about AWS EC2 SLAs here: https://aws.amazon.com/compute/sla/. Note that for a single instance AWS has the following SLA:

For each individual Amazon EC2 instance (“Single EC2 Instance”), AWS will use commercially reasonable efforts to make the Single EC2 Instance available with an Instance-Level Uptime Percentage of at least 99.5%, in each case during any monthly billing cycle (the “Instance-Level SLA”).

Calling a stop and a start API call on the instance will usually cause EC2 to migrate it to a new host, which then causes the issue to go away.

AWS
EXPERT
answered 10 months ago
0

It doesn't sound like there was any problem on AWS's side. If they'd had a major outage with DHCP, it wouldn't have gone unnoticed by a lot of customers.

You said the initial restart didn't help. Did you check that Ubuntu actually restarted and didn't receive an IP? You said that you had to force a power-off, which strongly suggests that Ubuntu wasn't responding to the command that EC2 sent via ACPI to shut down gracefully. If Ubuntu wasn't even responding to a power-off signal (equivalent to the physical power button being pressed on a hardware machine), it must have been confused enough for it to be unlikely to have responded to a reboot signal either.

For a safety net, you could set a CloudWatch alarm to detect the instance health checks failing and trigger a reboot or auto-recovery (equivalent to a power-off followed by power-on) via the EC2 API. Instructions for the recover action: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingAlarmActions.html#AddingRecoverActions

If Ubuntu has general trouble obtaining a DHCP lease while starting up but is able to renew the lease while it's running, you could also consider simply configuring Ubuntu with a static IP instead of using DHCP. The private IP of the primary ENI remains unchanged for the lifetime of the EC2 instance.

EXPERT
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions