What causes cpu overload and then server down after log of "Stopped, Starting Snap Daemon" in EC2?

0

I was running a spring web server in ubuntu on aws ec2. The server unexpectedly encountered a status check error during off-duty hours and forced a restart.

From the monitoring tool, it seems that the server did not work normally after a few minutes after the CPU increased rapidly.

After the forced restart, it worked normally, and when I checked the system log, it was as follows.

Oct  6 18:32:45 ip-12-0-10-30 snapd[62550]: github.com/snapcore/snapd/overlord/ifacestate/udevmonitor.(*Monitor).Run.func1(0xc00004cfc8, 0x55568c85dad2)
Oct  6 18:32:45 ip-12-0-10-30 snapd[62550]: #011/build/snapd/parts/snapd-deb/build/overlord/ifacestate/udevmonitor/udevmon.go:147 +0x329
Oct  6 18:32:45 ip-12-0-10-30 snapd[62550]: gopkg.in/tomb%2ev2.(*Tomb).run(0xc000152c60, 0xc000076b40)
Oct  6 18:32:46 ip-12-0-10-30 systemd[1]: snapd.service: Failed with result 'watchdog'.
Oct  6 18:32:46 ip-12-0-10-30 snapd[62550]: #011/build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2d
Oct  6 18:32:47 ip-12-0-10-30 snapd[62550]: created by gopkg.in/tomb%2ev2.(*Tomb).Go
Oct  6 18:32:47 ip-12-0-10-30 snapd[62550]: #011/build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xc9
Oct  6 18:32:48 ip-12-0-10-30 snapd[62550]: rax    0xca
Oct  6 18:32:49 ip-12-0-10-30 snapd[62550]: rbx    0x55568def63a0
Oct  6 18:32:49 ip-12-0-10-30 snapd[62550]: rcx    0x55568c42a793
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rdx    0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rdi    0x55568def64e8
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rsi    0x80
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rbp    0x7ffc63c1fd38
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rsp    0x7ffc63c1fcf0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r8     0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r9     0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r10    0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r11    0x286
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r12    0xff
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r13    0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r14    0x55568d089af2
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r15    0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rip    0x55568c42a791
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rflags 0x286
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: cs     0x33
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: fs     0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: gs     0x0
Oct  6 18:32:50 ip-12-0-10-30 systemd[1]: snapd.service: Scheduled restart job, restart counter is at 3.
Oct  6 18:32:51 ip-12-0-10-30 snapd[98728]: AppArmor status: apparmor is enabled and all features are available
Oct  6 18:32:51 ip-12-0-10-30 systemd[1]: Stopped Snap Daemon.
Oct  6 18:32:52 ip-12-0-10-30 systemd[1]: Starting Snap Daemon...
Oct  6 18:32:52 ip-12-0-10-30 systemd[1]: snapd.service: start operation timed out. Terminating.
Oct  6 18:32:53 ip-12-0-10-30 systemd[1]: snapd.service: Failed with result 'timeout'.

skip

Oct  6 19:01:27 ip-12-0-10-30 systemd[1]: Stopped Snap Daemon.
Oct  6 19:02:54 ip-12-0-10-30 systemd[1]: Starting Snap Daemon...
Oct  6 19:05:14 ip-12-0-10-30 systemd[1]: snapd.service: start operation timed out. Terminating.
Oct  6 19:07:10 ip-12-0-10-30 systemd[1]: snapd.service: Failed with result 'timeout'.
Oct  6 19:08:57 ip-12-0-10-30 systemd[1]: Failed to start Snap Daemon.
Oct  6 19:14:41 ip-12-0-10-30 systemd[1]: snapd.service: Scheduled restart job, restart counter is at 20.
Oct  6 19:16:14 ip-12-0-10-30 systemd[1]: Stopped Snap Daemon.
Oct  6 19:18:46 ip-12-0-10-30 systemd[1]: Starting Snap Daemon...
Oct  6 19:22:44 ip-12-0-10-30 CRON[99169]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Oct  6 19:31:10 ip-12-0-10-30 systemd-networkd[425]: ens5: Could not set DHCPv4 address: Connection timed out
Oct  6 19:33:19 ip-12-0-10-30 systemd-networkd[425]: ens5: Failed
Oct  6 20:41:15 ip-12-0-10-30 CRON[99214]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Oct  6 20:50:55 ip-12-0-10-30 systemd-timesyncd[352]: Network configuration changed, trying to establish connection.

ec2 cpu monitoring graph

The graph above shows a sharp increase in cpu.

At 17:40, the cpu usage increased, and from 17:44, it increased sharply. It peaked at 93% usage at 17:55 and the CPU usage dropped at 18:55.

The status check failed from 20:25, and the server went down around 20:30.

My guess is snapd.service: Watchdog timeout (limit 5min)! It seems that snapd started and stopped after that.

What is the cause of this and what can be done to prevent it?

joker
asked a year ago67 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions