What causes cpu overload and then server down after log of "Stopped, Starting Snap Daemon" in EC2?

0

I was running a spring web server in ubuntu on aws ec2. The server unexpectedly encountered a status check error during off-duty hours and forced a restart.

From the monitoring tool, it seems that the server did not work normally after a few minutes after the CPU increased rapidly.

After the forced restart, it worked normally, and when I checked the system log, it was as follows.

Oct  6 18:32:45 ip-12-0-10-30 snapd[62550]: github.com/snapcore/snapd/overlord/ifacestate/udevmonitor.(*Monitor).Run.func1(0xc00004cfc8, 0x55568c85dad2)
Oct  6 18:32:45 ip-12-0-10-30 snapd[62550]: #011/build/snapd/parts/snapd-deb/build/overlord/ifacestate/udevmonitor/udevmon.go:147 +0x329
Oct  6 18:32:45 ip-12-0-10-30 snapd[62550]: gopkg.in/tomb%2ev2.(*Tomb).run(0xc000152c60, 0xc000076b40)
Oct  6 18:32:46 ip-12-0-10-30 systemd[1]: snapd.service: Failed with result 'watchdog'.
Oct  6 18:32:46 ip-12-0-10-30 snapd[62550]: #011/build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2d
Oct  6 18:32:47 ip-12-0-10-30 snapd[62550]: created by gopkg.in/tomb%2ev2.(*Tomb).Go
Oct  6 18:32:47 ip-12-0-10-30 snapd[62550]: #011/build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xc9
Oct  6 18:32:48 ip-12-0-10-30 snapd[62550]: rax    0xca
Oct  6 18:32:49 ip-12-0-10-30 snapd[62550]: rbx    0x55568def63a0
Oct  6 18:32:49 ip-12-0-10-30 snapd[62550]: rcx    0x55568c42a793
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rdx    0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rdi    0x55568def64e8
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rsi    0x80
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rbp    0x7ffc63c1fd38
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rsp    0x7ffc63c1fcf0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r8     0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r9     0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r10    0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r11    0x286
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r12    0xff
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r13    0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r14    0x55568d089af2
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: r15    0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rip    0x55568c42a791
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: rflags 0x286
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: cs     0x33
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: fs     0x0
Oct  6 18:32:50 ip-12-0-10-30 snapd[62550]: gs     0x0
Oct  6 18:32:50 ip-12-0-10-30 systemd[1]: snapd.service: Scheduled restart job, restart counter is at 3.
Oct  6 18:32:51 ip-12-0-10-30 snapd[98728]: AppArmor status: apparmor is enabled and all features are available
Oct  6 18:32:51 ip-12-0-10-30 systemd[1]: Stopped Snap Daemon.
Oct  6 18:32:52 ip-12-0-10-30 systemd[1]: Starting Snap Daemon...
Oct  6 18:32:52 ip-12-0-10-30 systemd[1]: snapd.service: start operation timed out. Terminating.
Oct  6 18:32:53 ip-12-0-10-30 systemd[1]: snapd.service: Failed with result 'timeout'.

skip

Oct  6 19:01:27 ip-12-0-10-30 systemd[1]: Stopped Snap Daemon.
Oct  6 19:02:54 ip-12-0-10-30 systemd[1]: Starting Snap Daemon...
Oct  6 19:05:14 ip-12-0-10-30 systemd[1]: snapd.service: start operation timed out. Terminating.
Oct  6 19:07:10 ip-12-0-10-30 systemd[1]: snapd.service: Failed with result 'timeout'.
Oct  6 19:08:57 ip-12-0-10-30 systemd[1]: Failed to start Snap Daemon.
Oct  6 19:14:41 ip-12-0-10-30 systemd[1]: snapd.service: Scheduled restart job, restart counter is at 20.
Oct  6 19:16:14 ip-12-0-10-30 systemd[1]: Stopped Snap Daemon.
Oct  6 19:18:46 ip-12-0-10-30 systemd[1]: Starting Snap Daemon...
Oct  6 19:22:44 ip-12-0-10-30 CRON[99169]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Oct  6 19:31:10 ip-12-0-10-30 systemd-networkd[425]: ens5: Could not set DHCPv4 address: Connection timed out
Oct  6 19:33:19 ip-12-0-10-30 systemd-networkd[425]: ens5: Failed
Oct  6 20:41:15 ip-12-0-10-30 CRON[99214]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Oct  6 20:50:55 ip-12-0-10-30 systemd-timesyncd[352]: Network configuration changed, trying to establish connection.

ec2 cpu monitoring graph

The graph above shows a sharp increase in cpu.

At 17:40, the cpu usage increased, and from 17:44, it increased sharply. It peaked at 93% usage at 17:55 and the CPU usage dropped at 18:55.

The status check failed from 20:25, and the server went down around 20:30.

My guess is snapd.service: Watchdog timeout (limit 5min)! It seems that snapd started and stopped after that.

What is the cause of this and what can be done to prevent it?

joker
質問済み 2年前67ビュー
回答なし

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ