Hi,
we have a small db.t4g.micro RDS (PostgreSQL) instance as a development database and an RDS proxy in front of it.
We started sending traffic constantly to it during the last 12 hours or so and we noticed that the instance restarted on its own twice.
Looking at the monitoring tab in RDS console, we observed that the restart times coincide exactly with the times that EBS Bytes Balance dropped to zero. (CPU credit balance, Burst Balance and EBS IO Balance were all > 0).
Excerpt from the error/postgresql.log:
2023-03-01 04:51:55 UTC::@:[375]:LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=1.754 s; sync files=0, longest=0.000 s, average=0.000 s; distance=65535 kB, estimate=65535 kB
2023-03-01 04:56:54 UTC::@:[375]:LOG: checkpoint starting: time
2023-03-01 04:56:54 UTC::@:[375]:LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.639 s; sync files=0, longest=0.000 s, average=0.000 s; distance=65535 kB, estimate=65535 kB
2023-03-01 04:57:21 UTC:172.31.5.233(18341):postgres@dns:[384]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC:172.31.5.233(64393):postgres@dns:[386]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC:172.31.45.245(10055):rdsproxyadmin@postgres:[31769]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC::@:[372]:LOG: received SIGHUP, reloading configuration files
2023-03-01 04:57:21 UTC:172.31.45.245(19305):postgres@dns:[388]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC:172.31.5.233(19611):rdsproxyadmin@postgres:[31768]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC::@:[372]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:21 UTC::@:[375]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:24 UTC::@:[373]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:24 UTC::@:[373]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:21 UTC:172.31.5.233(61767):postgres@dns:[383]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:22 UTC::@:[31837]:FATAL: terminating autovacuum process due to administrator command
2023-03-01 04:57:23 UTC::@:[379]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:21 UTC:172.31.45.245(10079):postgres@dns:[401]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:23 UTC::@:[375]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:23 UTC::@:[379]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:21 UTC:172.31.45.245(51481):postgres@dns:[389]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:22 UTC:172.31.5.233(52633):postgres@dns:[385]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:23 UTC:172.31.5.233(17531):postgres@dns:[382]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:24 UTC::@:[377]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:25 UTC::@:[372]:LOG: received smart shutdown request
2023-03-01 04:57:29 UTC::@:[372]:LOG: background worker "logical replication launcher" (PID 380) exited with exit code 1
2023-03-01 04:57:29 UTC::@:[375]:LOG: shutting down
2023-03-01 04:57:29 UTC::@:[375]:LOG: checkpoint starting: shutdown immediate
2023-03-01 04:57:29 UTC::@:[375]:LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.308 s; sync files=0, longest=0.000 s, average=0.000 s; distance=65535 kB, estimate=65535 kB
2023-03-01 04:57:31 UTC:127.0.0.1(49550):rdsadmin@rdsadmin:[31839]:FATAL: the database system is shutting down
2023-03-01 04:57:32 UTC::@:[372]:LOG: database system is shut down
Should EBS Bytes Balance exhaustion really lead to the database spontaneously restarting?
Thanks,
Nikos
Hi @philaws,
the instance was restarted twice, several hours apart.. do you think there were two hardware failures? where do you base this conclusion?