I am running a EC2 (m6i.large) with Jenkins/PHP/MySQL + some basic services. It was running on 18.x Ubuntu fine and I upgraded it to 20.x and then 22.x yesterday. Since then the EC2 freezes (unable to SSH) and getting the error "instance reachability check failed" in AWS Console.
These are the last few logs from dmesg
.
[ 5.988941] audit: type=1400 audit(1713911132.732:28): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.firefox.hook.configure" pid=425 comm="apparmor_parser"
[ 5.993150] audit: type=1400 audit(1713911132.736:29): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.firefox.hook.disconnect-plug-host-hunspell" pid=428 comm="apparmor_parser"
[ 6.002181] audit: type=1400 audit(1713911132.744:30): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.firefox.hook.post-refresh" pid=429 comm="apparmor_parser"
[ 6.008875] parport_pc 00:03: reported by Plug and Play ACPI
[ 6.235431] ppdev: user-space parallel port driver
[ 9.785646] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 9.787920] Bridge firewalling registered
[ 9.900718] loop8: detected capacity change from 0 to 8
[ 9.984377] Initializing XFRM netlink socket
No error logs in /var/log/*.log
before freeze.
htop
shows normal or near zero CPU and memory usage before the freeze. No CPU spikes in the Monitoring tab in AWS Console.
I have disabled all the cron jobs.
This is the current OS version:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
I have tried stopping all the services such as Jenkins/Docker/PHP/Nginx etc., but the server still freezes at 1 hour exactly (+ 3 to 6 seconds). I looked at ps -eaf
and systemctl list-units --type=service --state=running
and I don't see anything strange running there.
After I reboot, I am able to SSH and then it freezes again in 1 hour.
How do I go further investigate what's causing this?