How can I resolve common issues when I use read replicas in Aurora?

4 minute read

I have an Amazon Aurora MySQL-Compatible Edition DB instance, and I'm experiencing issues when I use read replicas. I want to troubleshoot these issues.


Promote an Aurora read replica

To promote another read replica instance as a writer instance, perform a manual failover.

Complete the following steps:

  1. Open the Amazon Relational Database Service (Amazon RDS) console.
  2. In the navigation pane, choose Databases.
  3. Select the writer instance for your Aurora DB cluster.
  4. Choose Actions, and then choose Failover.

If the writer instance becomes unavailable, then Aurora automatically fails over to a read replica instance. A number of reasons can cause a writer instance to become unavailable, such as resource contention and maintenance activity.

If you have multiple readers, then specify a promotion priority tier to each instance in your cluster. When the writer instance fails, Aurora promotes the replica with the highest priority as the new writer.

You can also promote a cross-AWS Region Aurora replica as a standalone DB cluster. After you initiate the promotion process, the cross-Region replication stops. The newly promoted cluster functions as an independent DB cluster, and manages both read and write operations.

Measure replication lag

Because all Aurora DB instances in a DB cluster share a common data volume, there's minimal replication lag. However in some scenarios, you might observe slightly increased lag on the readers.

Note: Cross-Region replicas use logical replication. Change and apply rates and delays in network communication between the selected Regions can affect cross-Region replicas. Cross-Region replicas that use Aurora databases have a typical lag of under 1 second.

To measure replication lag, use the following Amazon CloudWatch metrics:

  • AuroraReplicaLag measures replica lag between the writer and reader node in milliseconds in the same Region.
  • AuroraBinlogReplicaLag measures replica lag between Aurora DB clusters that use binary logs.

Improve replication performance

To improve replication lag, take the following actions:

  • If the reader instance is smaller than the writer instance, then the volume of changes might be too much for the reader to catch up. To avoid heavy workloads on the reader instances, it's a best practice to make all instances in a cluster the same size.
    Note: If there's heavy workload on the writer instance, then you might notice temporary read replica lag. After the reader instance catches up with the writer instance, the lag reduces.
  • If long-running transactions are in progress, then a replica lag might occur on the readers. To avoid replica lag, run your transactions in smaller batches and frequently run commits.

For information about how to use native binlog-based MySQL replication to troubleshoot replica lag, see Overview of backing up and restoring an Aurora DB cluster.

Troubleshoot high replication lag

You can check high replication lag in the AuroraReplicaLag CloudWatch metric. High replication lag can cause a reader instance to restart. To prevent a frequent reader instance restart because of high replication lag, see Why did my Amazon Aurora read replica fall behind and restart?

Set up GTID-based replication

Aurora doesn't use native binlog replication to replicate data to read replica instances. You can't use global transaction identifiers (GTID) to replicate data between instances in the same cluster. However, you can set up GTID-based replication in certain scenarios. For more information about how to use GTID-based replication in Aurora MySQL-Compatible, see Amazon Aurora for MySQL compatibility now supports global transaction identifiers (GTIDs) replication.

Note: You can set up GTID-based replication between an Amazon RDS MySQL and an Aurora cluster and between Aurora clusters. The source needs to be an external master. Be sure to enable binlog on the source before you start the replication process.

For more information about GTID, see GTID format and storage on the MySQL website.

Related information

Replicating Amazon Aurora MySQL DB clusters across AWS Regions

Replication with Amazon Aurora

AWS OFFICIALUpdated a month ago

It'd be helpful to include troubleshooting steps in the event that there is no replication happening at all.

replied 10 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
replied 10 months ago