How to fix a terminated RDS Postgres replicas

0

We came across a weird scenario in which the replication for one of our RDS Postgres replicas is terminated per below screenshot. We could not find any root cause for this issue and we were not able to fix the issue yet either.

Enter image description here

Here is only log that might be related which we could find within our RDS read replica logs:

000000010002F94800000019 archive /rdsdbdata/log/restore/pg-wal-archive.12472857.* is not yet downloaded, exiting restore script for now
2022-12-11 09:59:59 UTC::@:[29307]:LOG: started streaming WAL from primary at 2F948/64000000 on timeline 1
2022-12-11 09:59:59 UTC::@:[29307]:FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010002F94800000019 has already been removed
recovering 00000002.history

I was wondering if anyone came across this issue and know how to fix the replication?

Thanks in advance! Mehdi

1回答
0

Hi, If you are using postgres 9+, it is possible bugs on the wal in replication processes. Try to use updated version one: https://www.postgresql.org/about/news/postgresql-123-118-1013-9618-and-9522-released-2038/

I saw that there are one success read-replica, so for the terminated one please re-instantiated again new read replica. Also for best practices, you can do:

  1. Setup read-replica multi-az for more high availability
  2. Setup operational excellence, by subscribe into rds envets (such as db instance event for read - replica), so if read-replica crash you can trigger a workflow to initiate new one. https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.Messages.html#USER_Events.Messages.cluster Thanks.
profile pictureAWS
回答済み 1年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ