RDS instance restarted when EBS Bytes Balance exhausted

0

Hi,

we have a small db.t4g.micro RDS (PostgreSQL) instance as a development database and an RDS proxy in front of it. We started sending traffic constantly to it during the last 12 hours or so and we noticed that the instance restarted on its own twice. Looking at the monitoring tab in RDS console, we observed that the restart times coincide exactly with the times that EBS Bytes Balance dropped to zero. (CPU credit balance, Burst Balance and EBS IO Balance were all > 0). Excerpt from the error/postgresql.log:

2023-03-01 04:51:55 UTC::@:[375]:LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=1.754 s; sync files=0, longest=0.000 s, average=0.000 s; distance=65535 kB, estimate=65535 kB
2023-03-01 04:56:54 UTC::@:[375]:LOG: checkpoint starting: time
2023-03-01 04:56:54 UTC::@:[375]:LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.639 s; sync files=0, longest=0.000 s, average=0.000 s; distance=65535 kB, estimate=65535 kB
2023-03-01 04:57:21 UTC:172.31.5.233(18341):postgres@dns:[384]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC:172.31.5.233(64393):postgres@dns:[386]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC:172.31.45.245(10055):rdsproxyadmin@postgres:[31769]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC::@:[372]:LOG: received SIGHUP, reloading configuration files
2023-03-01 04:57:21 UTC:172.31.45.245(19305):postgres@dns:[388]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC:172.31.5.233(19611):rdsproxyadmin@postgres:[31768]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC::@:[372]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:21 UTC::@:[375]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:24 UTC::@:[373]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:24 UTC::@:[373]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:21 UTC:172.31.5.233(61767):postgres@dns:[383]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:22 UTC::@:[31837]:FATAL: terminating autovacuum process due to administrator command
2023-03-01 04:57:23 UTC::@:[379]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:21 UTC:172.31.45.245(10079):postgres@dns:[401]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:23 UTC::@:[375]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:23 UTC::@:[379]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:21 UTC:172.31.45.245(51481):postgres@dns:[389]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:22 UTC:172.31.5.233(52633):postgres@dns:[385]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:23 UTC:172.31.5.233(17531):postgres@dns:[382]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:24 UTC::@:[377]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:25 UTC::@:[372]:LOG: received smart shutdown request
2023-03-01 04:57:29 UTC::@:[372]:LOG: background worker "logical replication launcher" (PID 380) exited with exit code 1
2023-03-01 04:57:29 UTC::@:[375]:LOG: shutting down
2023-03-01 04:57:29 UTC::@:[375]:LOG: checkpoint starting: shutdown immediate
2023-03-01 04:57:29 UTC::@:[375]:LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.308 s; sync files=0, longest=0.000 s, average=0.000 s; distance=65535 kB, estimate=65535 kB
2023-03-01 04:57:31 UTC:127.0.0.1(49550):rdsadmin@rdsadmin:[31839]:FATAL: the database system is shutting down
2023-03-01 04:57:32 UTC::@:[372]:LOG: database system is shut down

Should EBS Bytes Balance exhaustion really lead to the database spontaneously restarting?

Thanks, Nikos

2 Answers
0
Accepted Answer

Hi, Yes, under some circumstances like excessive workload due to performance bottleneck and resource contention RDS can restart on itself. See https://aws.amazon.com/premiumsupport/knowledge-center/rds-multi-az-failover-restart/.

Therefore, if the instance restarts frequently due to lack of resources, consider scaling up your database instance, look into Provisioning IOPS SSD storage or upgrade to class to keep up with the increasing demands of your applications.

AWS
JS-AWS
answered a year ago
0

Nikos, I think in your case there was a hardware failure that triggered RDS automation to replace your underlying database instance. This was not related to your workload.

AWS
MODERATOR
philaws
answered a year ago
  • Hi @philaws,

    the instance was restarted twice, several hours apart.. do you think there were two hardware failures? where do you base this conclusion?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions