How to fix a terminated RDS Postgres replicas

0

We came across a weird scenario in which the replication for one of our RDS Postgres replicas is terminated per below screenshot. We could not find any root cause for this issue and we were not able to fix the issue yet either.

Enter image description here

Here is only log that might be related which we could find within our RDS read replica logs:

000000010002F94800000019 archive /rdsdbdata/log/restore/pg-wal-archive.12472857.* is not yet downloaded, exiting restore script for now
2022-12-11 09:59:59 UTC::@:[29307]:LOG: started streaming WAL from primary at 2F948/64000000 on timeline 1
2022-12-11 09:59:59 UTC::@:[29307]:FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010002F94800000019 has already been removed
recovering 00000002.history

I was wondering if anyone came across this issue and know how to fix the replication?

Thanks in advance! Mehdi

Mehdi
已提问 1 年前253 查看次数
1 回答
0

Hi, If you are using postgres 9+, it is possible bugs on the wal in replication processes. Try to use updated version one: https://www.postgresql.org/about/news/postgresql-123-118-1013-9618-and-9522-released-2038/

I saw that there are one success read-replica, so for the terminated one please re-instantiated again new read replica. Also for best practices, you can do:

  1. Setup read-replica multi-az for more high availability
  2. Setup operational excellence, by subscribe into rds envets (such as db instance event for read - replica), so if read-replica crash you can trigger a workflow to initiate new one. https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.Messages.html#USER_Events.Messages.cluster Thanks.
profile pictureAWS
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则