What causes cpu overload and then server down after log of "Stopped, Starting Snap Daemon" in EC2?
I was running a spring web server in ubuntu on aws ec2. The server unexpectedly encountered a status check error during off-duty hours and forced a restart.
From the monitoring tool, it seems that the server did not work normally after a few minutes after the CPU increased rapidly.
After the forced restart, it worked normally, and when I checked the system log, it was as follows.
Oct 6 18:32:45 ip-12-0-10-30 snapd[62550]: github.com/snapcore/snapd/overlord/ifacestate/udevmonitor.(*Monitor).Run.func1(0xc00004cfc8, 0x55568c85dad2)
Oct 6 18:32:45 ip-12-0-10-30 snapd[62550]: #011/build/snapd/parts/snapd-deb/build/overlord/ifacestate/udevmonitor/udevmon.go:147 +0x329
Oct 6 18:32:45 ip-12-0-10-30 snapd[62550]: gopkg.in/tomb%2ev2.(*Tomb).run(0xc000152c60, 0xc000076b40)
Oct 6 18:32:46 ip-12-0-10-30 systemd[1]: snapd.service: Failed with result 'watchdog'.
Oct 6 18:32:46 ip-12-0-10-30 snapd[62550]: #011/build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2d
Oct 6 18:32:47 ip-12-0-10-30 snapd[62550]: created by gopkg.in/tomb%2ev2.(*Tomb).Go
Oct 6 18:32:47 ip-12-0-10-30 snapd[62550]: #011/build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xc9
Oct 6 18:32:48 ip-12-0-10-30 snapd[62550]: rax 0xca
Oct 6 18:32:49 ip-12-0-10-30 snapd[62550]: rbx 0x55568def63a0
Oct 6 18:32:49 ip-12-0-10-30 snapd[62550]: rcx 0x55568c42a793
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rdx 0x0
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rdi 0x55568def64e8
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rsi 0x80
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rbp 0x7ffc63c1fd38
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rsp 0x7ffc63c1fcf0
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r8 0x0
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r9 0x0
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r10 0x0
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r11 0x286
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r12 0xff
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r13 0x0
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r14 0x55568d089af2
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r15 0x0
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rip 0x55568c42a791
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rflags 0x286
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: cs 0x33
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: fs 0x0
Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: gs 0x0
Oct 6 18:32:50 ip-12-0-10-30 systemd[1]: snapd.service: Scheduled restart job, restart counter is at 3.
Oct 6 18:32:51 ip-12-0-10-30 snapd[98728]: AppArmor status: apparmor is enabled and all features are available
Oct 6 18:32:51 ip-12-0-10-30 systemd[1]: Stopped Snap Daemon.
Oct 6 18:32:52 ip-12-0-10-30 systemd[1]: Starting Snap Daemon...
Oct 6 18:32:52 ip-12-0-10-30 systemd[1]: snapd.service: start operation timed out. Terminating.
Oct 6 18:32:53 ip-12-0-10-30 systemd[1]: snapd.service: Failed with result 'timeout'.
skip
Oct 6 19:01:27 ip-12-0-10-30 systemd[1]: Stopped Snap Daemon.
Oct 6 19:02:54 ip-12-0-10-30 systemd[1]: Starting Snap Daemon...
Oct 6 19:05:14 ip-12-0-10-30 systemd[1]: snapd.service: start operation timed out. Terminating.
Oct 6 19:07:10 ip-12-0-10-30 systemd[1]: snapd.service: Failed with result 'timeout'.
Oct 6 19:08:57 ip-12-0-10-30 systemd[1]: Failed to start Snap Daemon.
Oct 6 19:14:41 ip-12-0-10-30 systemd[1]: snapd.service: Scheduled restart job, restart counter is at 20.
Oct 6 19:16:14 ip-12-0-10-30 systemd[1]: Stopped Snap Daemon.
Oct 6 19:18:46 ip-12-0-10-30 systemd[1]: Starting Snap Daemon...
Oct 6 19:22:44 ip-12-0-10-30 CRON[99169]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 6 19:31:10 ip-12-0-10-30 systemd-networkd[425]: ens5: Could not set DHCPv4 address: Connection timed out
Oct 6 19:33:19 ip-12-0-10-30 systemd-networkd[425]: ens5: Failed
Oct 6 20:41:15 ip-12-0-10-30 CRON[99214]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 6 20:50:55 ip-12-0-10-30 systemd-timesyncd[352]: Network configuration changed, trying to establish connection.
The graph above shows a sharp increase in cpu.
At 17:40, the cpu usage increased, and from 17:44, it increased sharply. It peaked at 93% usage at 17:55 and the CPU usage dropped at 18:55.
The status check failed from 20:25, and the server went down around 20:30.
My guess is snapd.service: Watchdog timeout (limit 5min)! It seems that snapd started and stopped after that.
What is the cause of this and what can be done to prevent it?
- 最新
- 投票最多
- 评论最多
相关内容
- AWS 官方已更新 2 年前
- AWS 官方已更新 8 个月前
- AWS 官方已更新 1 年前