How to fix a terminated RDS Postgres replicas

0

We came across a weird scenario in which the replication for one of our RDS Postgres replicas is terminated per below screenshot. We could not find any root cause for this issue and we were not able to fix the issue yet either.

Enter image description here

Here is only log that might be related which we could find within our RDS read replica logs:

000000010002F94800000019 archive /rdsdbdata/log/restore/pg-wal-archive.12472857.* is not yet downloaded, exiting restore script for now
2022-12-11 09:59:59 UTC::@:[29307]:LOG: started streaming WAL from primary at 2F948/64000000 on timeline 1
2022-12-11 09:59:59 UTC::@:[29307]:FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010002F94800000019 has already been removed
recovering 00000002.history

I was wondering if anyone came across this issue and know how to fix the replication?

Thanks in advance! Mehdi

1 Answer
0

Hi, If you are using postgres 9+, it is possible bugs on the wal in replication processes. Try to use updated version one: https://www.postgresql.org/about/news/postgresql-123-118-1013-9618-and-9522-released-2038/

I saw that there are one success read-replica, so for the terminated one please re-instantiated again new read replica. Also for best practices, you can do:

  1. Setup read-replica multi-az for more high availability
  2. Setup operational excellence, by subscribe into rds envets (such as db instance event for read - replica), so if read-replica crash you can trigger a workflow to initiate new one. https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.Messages.html#USER_Events.Messages.cluster Thanks.
profile pictureAWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions