Help us improve the AWS re:Post Knowledge Center by sharing your feedback in a brief survey. Your input can influence how we create and update our content to better support your AWS journey.
How do I troubleshoot logical replication issues for an Aurora PostgreSQL-Compatible source DB cluster?
I want to troubleshoot logical replication issues for my Amazon Aurora PostgreSQL-Compatible Edition source database (DB) cluster.
Resolution
Configure replication parameters
Before you troubleshoot logical replication issues, configure the following parameters for your DB cluster parameter group:
- Set the rds.logical_replication value to 1 to activate logical replication on your DB cluster.
- Set a value for max_replication_slots that includes enough slots for your expected subscription count and additional table synchronization processes.
- Set max_wal_senders to a value that supports your max_replication_slots quota and your current physical replicas.
- Set max_logical_replication_workers to a value that supports your subscription count and the number of additional workers that the replication system requires for table synchronization.
- Set max_worker_processes to a value that's at least one more than the max_logical_replication_workers value. For example, if max_logical_replication_workers is 25, then set max_worker_processes to 26.
- If data copy is slow, then increase the max_sync_workers_per_subscription value to control the number of synchronization workers that process the subscription setup and new table additions.
Important: To apply the preceding changes, you must reboot your Aurora PostgreSQL-Compatible DB cluster.
Check publication and subscription configuration
After you configure the logical replication parameters, verify that you correctly configured the publications and subscriptions. Make sure that the publication includes all the intended tables, the subscription parameters are correct, and the replication user has the appropriate permissions.
Connect to your source database, and then run the following commands.
To check your publication configuration, run the following commands:
SELECT * FROM pg_publication; SELECT * FROM pg_publication_tables;
To check your subscription configuration, run the following commands:
SELECT * FROM pg_subscription; SELECT * FROM pg_subscription_rel;
Confirm that the replication slots are active
Connect to your source database, and then run the following command:
SELECT slot_name, plugin, slot_type, active, restart_lsn, confirmed_flush_lsn, pg_current_wal_lsn(), pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS lag FROM pg_replication_slots;
If slots are inactive or show excessive lag, then review the replication workers to troubleshoot the issue.
Confirm that the replication workers are active
Connect to your target database, and then run the following command:
SELECT pid, state, query, wait_event, backend_type FROM pg_stat_activity WHERE backend_type LIKE 'logical replication%';
If there are no replication workers, then restart your subscription.
To turn off your subscription, run the following command:
ALTER SUBSCRIPTION subscription_name DISABLE;
To turn on your subscription, run the following command:
ALTER SUBSCRIPTION subscription_name ENABLE;
If there are still no replication workers after you restart the subscription, then check the PostgreSQL error logs for error messages.
Monitor replication progress and lag
Your replication might experience lag because of high transaction volume and large transactions on the publication. Or, there might be constraints on the publication or subscription.
To determine whether replication stopped or is slow, monitor the replication progress for a few intervals.
Connect to your DB cluster, and then run the following commands.
To check the replication progress for your publication, run the following command:
SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS lag, active FROM pg_replication_slots WHERE slot_type = 'logical';
To check the replication progress for your subscription, run the following command:
SELECT subname, pid, received_lsn, latest_end_lsn, pg_size_pretty(pg_wal_lsn_diff(latest_end_lsn, received_lsn)) AS lag FROM pg_stat_subscription;
To minimize the replication lag, monitor the write-through cache. If the cache size isn't sufficient for your workloads, then you can manually change the rds.logical_wal_cache value. For more information, see Achieve up to 17x lower replication lag with the new write-through cache for Aurora PostgreSQL.
Monitor resource constraints
To troubleshoot resource constraints on the publisher and subscriber, turn on Enhanced Monitoring and monitor CPUUtilization, FreeableMemory, SwapUsage, and NetworkThroughput. Set up Amazon CloudWatch alarms to be notified when replication issues occur.
For more information, see How do I identify and troubleshoot performance issues and slow-running queries in my Amazon RDS for PostgreSQL or Aurora PostgreSQL-Compatible DB instance?
Identify schema mismatches and resolve data inconsistencies
Connect to your source database and target database. Then, confirm that there are columns in the replicated tables and the data types are compatible. Also, make sure that the primary keys and unique constraints are consistent.
To turn on your subscription, run the following command:
ALTER SUBSCRIPTION subscription_name ENABLE;
To compare table definitions, run the following command on both databases:
SELECT column_name, data_type, character_maximum_length FROM information_schema.columns WHERE table_name = 'your_table_name' ORDER BY ordinal_position;
Note: Replace your_table_name with your table name.
Resolve conflicts
Native PostgreSQL logical replication can't detect data conflicts from multiple publishers or replicated modifications that conflict with data that you changed locally. If there's a current row with the same key, then Aurora applies the update and the insert fails.
To identify what caused the conflict, check the PostgreSQL logs.
The following example log shows that the replication failed because it tried to insert a record with an asset ID that already exists in the target database:
ERROR: 23505: duplicate key value violates unique constraint "asset_pkey" DETAIL: Key (asset_id)=(7) already exists. CONTEXT: processing remote data for replication origin "pg_32796" during message type "INSERT" for replication target relation "public.asset" in transaction 315434, finished at 0/6A12458
The replication origin is pg_32796 and finishes at Logical Sequence Number (LSN) 0/6A12458.
To manually fix the data, you can stop the replication on the conflict or configure the subscription with the disable_on_error option.
Or, you can review the data on the source and target to determine whether you can skip the LSN that caused the conflict. Then, use the pg_replication_origin_advance() function to skip the LSN that caused the conflict. For more information, see pg_replication_origin_advance (node_name text, lsn pg_lsn) on the PostgreSQL website.
Note: Aurora PostgreSQL-Compatible versions 15 and later support the pg_replication_origin_advance() function.
To skip the LSN, complete the following steps:
-
Run the following SQL command to temporarily turn off the subscription:
ALTER SUBSCRIPTION subscription_name DISABLE;Note: If you configured the subscription with the disable_on_error option, then the subscription automatically turns off after an error.
-
Use the following pg_replication_origin_advance() function to advance the origin to Finish_LSN+1:
SELECT pg_replication_origin_advance('node_name','Finish_LSN+1'::pg_lsn);Note: Replace node_name with your node's name.
-
Run the following command to turn on the subscription.:
ALTER SUBSCRIPTION subscription-name ENABLE;Note: Replace subscription-name with your subscription name.
To resolve several data inconsistencies, you might need to clean the target table and reconfigure the publication and subscription.
Related information
Replication with Amazon Aurora PostgreSQL
PostgreSQL Logical Replication on PostgreSQL website
Setting up logical replication for your Aurora PostgreSQL DB cluster
- Topics
- Database
- Language
- English

Relevant content
- Accepted Answerasked 3 years ago
- asked 10 months ago
AWS OFFICIALUpdated 4 months ago