t3.micro ec2 instance dropped off network

0

I have an Ubuntu t3.micro instance which dropped off the network this morning, from the system logs it suggests a DHCP timeout..

Aug 16 07:11:25 my-hostname systemd[1]: snapd.service: Watchdog timeout (limit 5min)!
Aug 16 07:08:59 my-hostname systemd-timesyncd[1111]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
Aug 16 07:08:44 my-hostname systemd-timesyncd[1111]: Network configuration changed, trying to establish connection.
Aug 16 07:08:42 my-hostname systemd-networkd[1099]: ens5: Could not set DHCPv4 route: Connection timed out
Aug 16 07:07:52 my-hostname systemd-networkd[1099]: ens5: Failed
Aug 16 07:07:52 my-hostname systemd-timesyncd[1111]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
Aug 16 07:07:50 my-hostname systemd-networkd[1099]: ens5: Could not set DHCPv4 route: Connection timed out
Aug 16 07:07:46 my-hostname systemd-timesyncd[1111]: Network configuration changed, trying to establish connection.
Aug 16 07:06:46 my-hostname systemd-timesyncd[1111]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
Aug 16 07:06:42 my-hostname systemd-networkd[1099]: ens5: Failed
Aug 16 07:06:41 my-hostname systemd-networkd[1099]: ens5: Could not set DHCPv4 address: Connection timed out
Aug 16 07:06:40 my-hostname systemd-timesyncd[1111]: Network configuration changed, trying to establish connection.

A network issue seems unlikely and I have seen stuff in the past when similar issues occur due to either CPU or disk burst credits depleting and the instance becoming throttled. In this case my instance (i-01d287e240ebc8e5f if any AWSers are here :)) did show a reduction in CPU credits but only by around 50%, they did not hit zero. Nobody was logged onto the box during the time and it came up fine after a reboot so the config is good.

Any thoughts where I might look next?

asked 3 years ago2068 views
1 Answer
1

I've hit a very similar problem on two of my instances in the last week.

First on instance i-0daac824ee4991919:

Aug 20 17:00:40 process systemd-networkd[62536]: ens5: DHCPv6 address 2600:1f16:216:7c64:ab39:691d:992f:3817/128 timeout preferred 150 valid 450
Aug 20 17:01:50 process systemd-networkd[62536]: ens5: Could not set DHCPv6 address: Connection timed out
Aug 20 17:03:33 process systemd-networkd[62536]: ens5: Failed

And then on instance i-0fbfc3151b977c780:

Aug 23 13:00:10 process systemd-networkd[58373]: ens5: DHCPv6 address 2600:1f16:216:7c80:abdb:db3c:fcfd:20fd/128 timeout preferred 150 valid 450
Aug 23 13:01:37 process systemd-networkd[58373]: ens5: Could not set NDisc route or address: Connection timed out
Aug 23 13:03:59 process systemd-networkd[58373]: ens5: Failed

In both cases, the problem seems to be two fold: The first is that the instance is getting very slow, there seem to be many different things that can trigger this, including running out of CPU credits, running out of EBS burst credits, and possibly either running out of memory or running very close to out of memory.

The second case for me looks at least somewhat like the running out of memory situation, but I have no clue on the first case.

The second part of the problem is that it looks like systemd-networkd never tries to recover from the interface going into a Failed state, this is very confusing to me, and I'd really like to get AWS support to give us an answer on how we can resolve that. I'd also love to get some guidance on getting things like memory pressure in the instance metrics.

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions