How to fix a terminated RDS Postgres replicas

0

We came across a weird scenario in which the replication for one of our RDS Postgres replicas is terminated per below screenshot. We could not find any root cause for this issue and we were not able to fix the issue yet either.

Enter image description here

Here is only log that might be related which we could find within our RDS read replica logs:

000000010002F94800000019 archive /rdsdbdata/log/restore/pg-wal-archive.12472857.* is not yet downloaded, exiting restore script for now
2022-12-11 09:59:59 UTC::@:[29307]:LOG: started streaming WAL from primary at 2F948/64000000 on timeline 1
2022-12-11 09:59:59 UTC::@:[29307]:FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010002F94800000019 has already been removed
recovering 00000002.history

I was wondering if anyone came across this issue and know how to fix the replication?

Thanks in advance! Mehdi

Mehdi
已提問 1 年前檢視次數 253 次
1 個回答
0

Hi, If you are using postgres 9+, it is possible bugs on the wal in replication processes. Try to use updated version one: https://www.postgresql.org/about/news/postgresql-123-118-1013-9618-and-9522-released-2038/

I saw that there are one success read-replica, so for the terminated one please re-instantiated again new read replica. Also for best practices, you can do:

  1. Setup read-replica multi-az for more high availability
  2. Setup operational excellence, by subscribe into rds envets (such as db instance event for read - replica), so if read-replica crash you can trigger a workflow to initiate new one. https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.Messages.html#USER_Events.Messages.cluster Thanks.
profile pictureAWS
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南