Questions tagged with Linux Provisioning

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

What causes cpu overload and then server down after log of "Stopped, Starting Snap Daemon" in EC2?

I was running a spring web server in ubuntu on aws ec2. The server unexpectedly encountered a status check error during off-duty hours and forced a restart. From the monitoring tool, it seems that the server did not work normally after a few minutes after the CPU increased rapidly. After the forced restart, it worked normally, and when I checked the system log, it was as follows. ``` Oct 6 18:32:45 ip-12-0-10-30 snapd[62550]: github.com/snapcore/snapd/overlord/ifacestate/udevmonitor.(*Monitor).Run.func1(0xc00004cfc8, 0x55568c85dad2) Oct 6 18:32:45 ip-12-0-10-30 snapd[62550]: #011/build/snapd/parts/snapd-deb/build/overlord/ifacestate/udevmonitor/udevmon.go:147 +0x329 Oct 6 18:32:45 ip-12-0-10-30 snapd[62550]: gopkg.in/tomb%2ev2.(*Tomb).run(0xc000152c60, 0xc000076b40) Oct 6 18:32:46 ip-12-0-10-30 systemd[1]: snapd.service: Failed with result 'watchdog'. Oct 6 18:32:46 ip-12-0-10-30 snapd[62550]: #011/build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2d Oct 6 18:32:47 ip-12-0-10-30 snapd[62550]: created by gopkg.in/tomb%2ev2.(*Tomb).Go Oct 6 18:32:47 ip-12-0-10-30 snapd[62550]: #011/build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xc9 Oct 6 18:32:48 ip-12-0-10-30 snapd[62550]: rax 0xca Oct 6 18:32:49 ip-12-0-10-30 snapd[62550]: rbx 0x55568def63a0 Oct 6 18:32:49 ip-12-0-10-30 snapd[62550]: rcx 0x55568c42a793 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rdx 0x0 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rdi 0x55568def64e8 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rsi 0x80 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rbp 0x7ffc63c1fd38 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rsp 0x7ffc63c1fcf0 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r8 0x0 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r9 0x0 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r10 0x0 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r11 0x286 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r12 0xff Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r13 0x0 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r14 0x55568d089af2 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: r15 0x0 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rip 0x55568c42a791 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: rflags 0x286 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: cs 0x33 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: fs 0x0 Oct 6 18:32:50 ip-12-0-10-30 snapd[62550]: gs 0x0 Oct 6 18:32:50 ip-12-0-10-30 systemd[1]: snapd.service: Scheduled restart job, restart counter is at 3. Oct 6 18:32:51 ip-12-0-10-30 snapd[98728]: AppArmor status: apparmor is enabled and all features are available Oct 6 18:32:51 ip-12-0-10-30 systemd[1]: Stopped Snap Daemon. Oct 6 18:32:52 ip-12-0-10-30 systemd[1]: Starting Snap Daemon... Oct 6 18:32:52 ip-12-0-10-30 systemd[1]: snapd.service: start operation timed out. Terminating. Oct 6 18:32:53 ip-12-0-10-30 systemd[1]: snapd.service: Failed with result 'timeout'. ``` skip ``` Oct 6 19:01:27 ip-12-0-10-30 systemd[1]: Stopped Snap Daemon. Oct 6 19:02:54 ip-12-0-10-30 systemd[1]: Starting Snap Daemon... Oct 6 19:05:14 ip-12-0-10-30 systemd[1]: snapd.service: start operation timed out. Terminating. Oct 6 19:07:10 ip-12-0-10-30 systemd[1]: snapd.service: Failed with result 'timeout'. Oct 6 19:08:57 ip-12-0-10-30 systemd[1]: Failed to start Snap Daemon. Oct 6 19:14:41 ip-12-0-10-30 systemd[1]: snapd.service: Scheduled restart job, restart counter is at 20. Oct 6 19:16:14 ip-12-0-10-30 systemd[1]: Stopped Snap Daemon. Oct 6 19:18:46 ip-12-0-10-30 systemd[1]: Starting Snap Daemon... Oct 6 19:22:44 ip-12-0-10-30 CRON[99169]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Oct 6 19:31:10 ip-12-0-10-30 systemd-networkd[425]: ens5: Could not set DHCPv4 address: Connection timed out Oct 6 19:33:19 ip-12-0-10-30 systemd-networkd[425]: ens5: Failed Oct 6 20:41:15 ip-12-0-10-30 CRON[99214]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Oct 6 20:50:55 ip-12-0-10-30 systemd-timesyncd[352]: Network configuration changed, trying to establish connection. ``` ![ec2 cpu monitoring graph](/media/postImages/original/IMX8ujANjPTkOit8MHrBQeUg) The graph above shows a sharp increase in cpu. At 17:40, the cpu usage increased, and from 17:44, it increased sharply. It peaked at 93% usage at 17:55 and the CPU usage dropped at 18:55. The status check failed from 20:25, and the server went down around 20:30. My guess is snapd.service: Watchdog timeout (limit 5min)! It seems that snapd started and stopped after that. What is the cause of this and what can be done to prevent it?
0
answers
0
votes
15
views
asked a month ago

Why are my EC2 instances not reporting their compliance status to SSM Patch Manager?

In SSM Patch Manager, under Compliance Reporting, our Amazon Linux 2 EC2 instances appear but in the 'Compliance status' column say 'Never reported'. The instances appear in Fleet Manager with 'SSM Agent ping status' of 'Online', and I can connect to the instances remotely using SSM `start-session`. I've checked all the troubleshooting steps in the docs at [Troubleshooting SSM Agent](https://docs.aws.amazon.com/systems-manager/latest/userguide/troubleshooting-ssm-agent.html), [this article about SSM logs](https://aws.amazon.com/premiumsupport/knowledge-center/ssm-agent-logs/) and [Troubleshooting Patch Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/patch-manager-troubleshooting.html#patch-manager-troubleshooting-contact-support), and everything appears to be set up properly (the instance role has the right permissions, the named servers are reachable, and the instances can reach public S3 buckets via the internet, we're not using a VPC endpoint). I've also tried restarting the SSM Agent. In the SSM Agent logs on the instance, I'm seeing: ``` 2022-10-25 00:36:48 INFO [ssm-agent-worker] [StartupProcessor] Write to serial port: Amazon SSM Agent v3.1.1732.0 is running ... 2022-10-25 01:15:00 INFO [ssm-agent-worker] [HealthCheck] HealthCheck reporting agent health. 2022-10-25 01:16:48 INFO [ssm-agent-worker] [MessageService] [MessageHandler] started idempotency deletion thread 2022-10-25 01:16:48 WARN [ssm-agent-worker] [MessageService] [MessageHandler] [Idempotency] encountered error open /var/lib/amazon/ssm/i-XXXXXXXXXXXXXXXXX/idempotency: no such file or directory while listing directories in /var/lib/amazon/ssm/i-XXXXXXXXXXXXXXXXX/idempotency 2022-10-25 01:16:48 INFO [ssm-agent-worker] [MessageService] [MessageHandler] ended idempotency deletion thread 2022-10-25 01:16:50 INFO [ssm-agent-worker] [MessageService] [MGSInteractor] send failed reply thread started 2022-10-25 01:16:50 INFO [ssm-agent-worker] [MessageService] [MGSInteractor] send failed reply thread done 2022-10-25 01:17:05 INFO [ssm-agent-worker] [MessageService] [Association] Schedule manager refreshed with 0 associations, 0 new associations associated 2022-10-25 01:20:00 INFO [ssm-agent-worker] [HealthCheck] HealthCheck reporting agent health. ``` Any clues why the instances aren't reporting their compliance status to Patch Manager? What additional steps can I use to troubleshoot this?
0
answers
0
votes
32
views
asked a month ago