- 最新
- 投票最多
- 评论最多
You contacted us because you observed the following error in the replica of your database even after hot_standby_feedback set to1 and you wanted to know how to fix this.
ERROR: canceling statement due to conflict with recovery DETAIL: User query might have needed to see row versions that must be removed.
Cause of the Issue
Usually the cause of this error comes when some queries running in Read replica trying to fetch the rows, and these rows are already updated/deleted in primary database OR Vacuuming in the primary already cleaned up the old version of these rows. In this scenario since the old version of rows was already cleaned up then the same goes to Read replica through WAL replication, So finally replica faces the conflict with recovery.
Below two settings control the behavior of a standby server that is to receive replication data.
A) max_standby_archive_delay (integer): Sets the maximum delay before canceling queries when a hot standby server is processing archived WAL data. max_standby_archive_delay applies when WAL data is being read from WAL archive When Hot Standby is active, this parameter determines how long the standby server should wait before canceling standby queries that conflict with about-to-be-applied WAL entries. max_standby_archive_delay applies when WAL data is being read from WAL archive (and is therefore not current). The default is 30 seconds.
https://postgresqlco.nf/doc/en/param/max_standby_archive_delay/
B) max_standby_streaming_delay (integer): Sets the maximum delay before canceling queries when a hot standby server is processing streamed WAL data. max_standby_streaming_delay applies when WAL data is being received via streaming replication. When Hot Standby is active, this parameter determines how long the standby server should wait before canceling standby queries that conflict with about-to-be-applied WAL entries. max_standby_streaming_delay applies when WAL data is being received via streaming replication. In case of Postgres 11 version it is 14 seconds and its modifiable to a maximum value of 30 sec only.
https://postgresqlco.nf/doc/en/param/max_standby_streaming_delay/
Suggestions
Unfortunately, as we don't have access to your data or databases due to AWS security and privacy policies, I would like to provide you with the following recommendations to resolve this issue and you may also consider to engage a DBA/Application team to investigate further.
If there are continuous long running queries present in reader instance, this workload pattern eventually causes the reader transactions to terminate due to conflict with recovery. Please make sure that the activities that require a highly restricted level of locking run when there is less activity on the reader instance. An example for this would be DROP and ALTER statements executing on the writer instance. To maintain data consistency, Postgres could cancel the transactions running against these relations on your reader instance. Please do not DROP tables or tablespaces on your writer while there are long running queries or open cursors running against those tables on your reader.
If you have long running queries on your reader instance, add an application side delay on queries running on the writer so they don't coincide with your reader queries. For example, if a long running transaction T1 is already running on your reader, running a transaction T2 on the writer at the same time, will lead to the transactions on the reader getting cancelled since they would result in dirty or phantom reads based on your isolation level. In such a case, your transaction T2 on the writer should be delayed until the transaction T1 on the reader completes. This will ensure consistent reads and that there are no lock conflicts. Set the following parameters to add the delays or test different values to get the optimal setting for your environment. As always, it is important to try these settings in a test environment before promoting it to production.
-max_standby_archive_delay=-1 <----(A value of -1 allows the standby to wait forever for conflicting queries to complete.This can introduce application performance issues. Try different values, max value is 2147483647milliseconds)
-max_standby_streaming_delay=-1 <----((A value of -1 allows the standby to wait forever for conflicting queries to complete.This can introduce application performance issues. Try different values, max value is 2147483647milliseconds)
On a database that has many active transactions, the likelihood of hitting the error increases. Tune your application to operate in batches if possible. If you tune your application workload to operate in batches, tables/resources in your writer are not held with locks for a long time while there are queries running on the same tables/resources on your replica.
I would like to suggest you to refer below AWS documents for reference. [+] https://aws.amazon.com/blogs/database/best-practices-for-amazon-rds-for-postgresql-cross-region-read-replicas/
If we implement the max_* parameters as described above, should we leave hot_standby_feedback=1 or set that back to hot_standby_feedback=0? I'm clear not on what hot_standby_feedback is accomplishing.
相关内容
- AWS 官方已更新 2 年前
- AWS 官方已更新 2 年前
- AWS 官方已更新 2 年前