How do I troubleshoot the "canceling statement due to conflict with recovery" error when I query the read replica for my Amazon RDS for PostgreSQL DB instance?

3 minute read
1

I set up a read replica for my Amazon Relational Database Service (Amazon RDS) for PostgreSQL instance. When I query the read replica, I receive the "canceling statement due to conflict with recovery" error.

Resolution

If there's a replication conflict between the primary instance and the read replica, then you receive the cancelling statement error. PostgreSQL can't apply the WAL information on the read replica because the changes might block an activity that's happening on the read replica. For example, you run a DROP statement on the primary instance when a SELECT statement is running on the dropped table in the read replica. If the read replica applies the WAL record and cancels the SELECT statement, then you receive the cancelling statement error.

To resolve the canceling statement error, set custom parameters on the read replica based on the details of the error message.

You can also use the pg_stat_database_conflicts view on the read replica for statistics about statements that are canceled because of conflicts with recovery. For more information, see 27.2.18 pg_stat_database_conflicts on the PostgreSQL website.

Error message details: "User was holding a relation lock for too long"

To resolve this issue, change the max_standby_streaming_delay or max_standby_archive_delay parameters to allow more time before the read replica cancels a conflicting standby statement. If the read replica reads the WAL data from the streaming replication, then modify the max_standby_streaming_delay parameter. If the read replica reads the WAL data from the archive location in Amazon Simple Storage Service (Amazon S3), then modify the max_standby_archive_delay parameter.

For more information about these parameters, see max_standby_streaming_delay (integer) and max_standby_archive_delay (integer) on the PostgreSQL website.

If you set the parameter value to 0, then the read replica cancels the conflicting queries and applies the WAL entries on the replica instance. If you set the value to -1, then you allow the replica instance to wait forever for conflicting queries, and replication lag increases. Based on your use case, tune the values of these parameters to balance query cancellation or replication lag.

Note: If you increase max_standby_archive_delay to keep queries that conflict with reading WAL archive entries, then it's a best practice to also increase max_standby_streaming_delay to prevent cancelations.

Error message details: "User query might have needed to see row versions that must be removed"

This issue typically occurs when a transaction on the read replica is reading tuples that are set for deletion on the primary instance. To resolve this issue, you might need to activate the hot_standby_feedback parameter.

When you activate hot_standby_feedback, the read replica sends feedback messages to the primary instance with information about the oldest active transaction. As a result, the primary instance doesn't remove records that the transaction might need. For more information about this parameter, see hot_standby_feedback (boolean) on the PostgreSQL website.

Important: The hot_standby_feedback parameter is deactivated by default. When you activate hot_standby_feedback on the read replica, long-running queries on the read replica might cause table bloat on the primary instance. Vacuum operations don't remove the dead tuples (bloat) that long-running queries might require. To reduce the risk of bloat, it's a best practice to also reduce autovacuum_threshold and autovacuum_vacuum_scale_factor and increase autovacuum_cost_limit.

3 Comments

Thank you for this post. I have a couple of questions: -> If you were to use a read-replica for analytics purposes, what would be a sensible configuration for these parameters? like all in -1 so that the queries have enough time processing? -> Do increments in time from these parameters impact the primary instance? If so, how can I mesure what part of the primary instance's performance corresponds to the modifications of these parameters on a read-replica?

replied 3 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 3 months ago

Hi team,

Any resolution for the below error?

[ERROR] OperationalError: ERROR: canceling statement due to conflict with recovery

DETAIL: User query may not have access to page data due to replica disconnect.

HINT: When the replica reconnects you will be able to repeat your command.

replied 3 months ago