RDS instance restarted when EBS Bytes Balance exhausted

0

Hi,

we have a small db.t4g.micro RDS (PostgreSQL) instance as a development database and an RDS proxy in front of it. We started sending traffic constantly to it during the last 12 hours or so and we noticed that the instance restarted on its own twice. Looking at the monitoring tab in RDS console, we observed that the restart times coincide exactly with the times that EBS Bytes Balance dropped to zero. (CPU credit balance, Burst Balance and EBS IO Balance were all > 0). Excerpt from the error/postgresql.log:

2023-03-01 04:51:55 UTC::@:[375]:LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=1.754 s; sync files=0, longest=0.000 s, average=0.000 s; distance=65535 kB, estimate=65535 kB
2023-03-01 04:56:54 UTC::@:[375]:LOG: checkpoint starting: time
2023-03-01 04:56:54 UTC::@:[375]:LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.639 s; sync files=0, longest=0.000 s, average=0.000 s; distance=65535 kB, estimate=65535 kB
2023-03-01 04:57:21 UTC:172.31.5.233(18341):postgres@dns:[384]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC:172.31.5.233(64393):postgres@dns:[386]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC:172.31.45.245(10055):rdsproxyadmin@postgres:[31769]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC::@:[372]:LOG: received SIGHUP, reloading configuration files
2023-03-01 04:57:21 UTC:172.31.45.245(19305):postgres@dns:[388]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC:172.31.5.233(19611):rdsproxyadmin@postgres:[31768]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:21 UTC::@:[372]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:21 UTC::@:[375]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:24 UTC::@:[373]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:24 UTC::@:[373]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:21 UTC:172.31.5.233(61767):postgres@dns:[383]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:22 UTC::@:[31837]:FATAL: terminating autovacuum process due to administrator command
2023-03-01 04:57:23 UTC::@:[379]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:21 UTC:172.31.45.245(10079):postgres@dns:[401]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:23 UTC::@:[375]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:23 UTC::@:[379]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:21 UTC:172.31.45.245(51481):postgres@dns:[389]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:22 UTC:172.31.5.233(52633):postgres@dns:[385]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:23 UTC:172.31.5.233(17531):postgres@dns:[382]:FATAL: terminating connection due to administrator command
2023-03-01 04:57:24 UTC::@:[377]:LOG: skipping missing configuration file "/rdsdbdata/config/recovery.conf"
2023-03-01 04:57:25 UTC::@:[372]:LOG: received smart shutdown request
2023-03-01 04:57:29 UTC::@:[372]:LOG: background worker "logical replication launcher" (PID 380) exited with exit code 1
2023-03-01 04:57:29 UTC::@:[375]:LOG: shutting down
2023-03-01 04:57:29 UTC::@:[375]:LOG: checkpoint starting: shutdown immediate
2023-03-01 04:57:29 UTC::@:[375]:LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.308 s; sync files=0, longest=0.000 s, average=0.000 s; distance=65535 kB, estimate=65535 kB
2023-03-01 04:57:31 UTC:127.0.0.1(49550):rdsadmin@rdsadmin:[31839]:FATAL: the database system is shutting down
2023-03-01 04:57:32 UTC::@:[372]:LOG: database system is shut down

Should EBS Bytes Balance exhaustion really lead to the database spontaneously restarting?

Thanks, Nikos

2 Respuestas
0
Respuesta aceptada

Hi, Yes, under some circumstances like excessive workload due to performance bottleneck and resource contention RDS can restart on itself. See https://aws.amazon.com/premiumsupport/knowledge-center/rds-multi-az-failover-restart/.

Therefore, if the instance restarts frequently due to lack of resources, consider scaling up your database instance, look into Provisioning IOPS SSD storage or upgrade to class to keep up with the increasing demands of your applications.

AWS
JS-AWS
respondido hace un año
0

Nikos, I think in your case there was a hardware failure that triggered RDS automation to replace your underlying database instance. This was not related to your workload.

AWS
MODERADOR
philaws
respondido hace un año
  • Hi @philaws,

    the instance was restarted twice, several hours apart.. do you think there were two hardware failures? where do you base this conclusion?

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas