How do I resolve high HADR_SYNC_COMMIT wait times in Amazon RDS for SQL Server Multi-AZ instances?
My Amazon Relational Database Service (Amazon RDS) for Microsoft SQL Server Multi-AZ instance has high HADR_SYNC_COMMIT wait times that cause slow query performance.
Short description
High HADR_SYNC_COMMIT wait times occur when transactions on the primary replica wait for the secondary replica to acknowledge a written transaction log.
The following issues can cause high HADR_SYNC_COMMIT wait times:
- Slow disk I/O or I/O throttling
- High CPU utilization
- Network latency
- Resource-intensive index operations that generate high transaction log volume
For more information about HADR_SYNC_COMMIT wait times, see HADR_SYNC_COMMIT wait updates on the Microsoft website.
Resolution
Prerequisites:
- Set up Enhanced Monitoring to monitor the operating system (OS) that runs on your RDS database (DB) instance.
- Turn on Performance Insights to identify queries that are responsible for the database load and the high HADR_SYNC_COMMIT wait times.
Important: Performance Insights will reach its end of life on June 30, 2026. You can upgrade to the Advanced mode of Database insights before June 30, 2026. If you don't upgrade, then DB clusters that use Performance Insights will default to the Standard mode of Database Insights. Only the Advanced mode of Database Insights will support execution plans and on-demand analysis. If your clusters default to the Standard mode, then you might not be able to use these features on the console. To turn on the Advanced mode, see Turning on the Advanced mode of Database Insights for Amazon RDS and Turning on the Advanced mode of Database Insights for Amazon Aurora.
Monitor replication lag and log send queue issues
Run the following SQL query:
SELECT ag.name AS ag_name, ar.replica_server_name, db_name(drs.database_id) AS database_name, drs.log_send_queue_size AS log_send_queue_kb, drs.log_send_rate AS log_send_rate_kb_s, drs.redo_queue_size AS redo_queue_kb, drs.redo_rate AS redo_rate_kb_s, drs.last_sent_time, drs.last_received_time, drs.last_hardened_time, DATEDIFF(MILLISECOND, drs.last_sent_time, drs.last_hardened_time) AS send_to_harden_lag_ms, drs.synchronization_state_desc, drs.synchronization_health_desc FROM sys.dm_hadr_database_replica_states drs JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id JOIN sys.availability_groups ag ON ar.group_id = ag.group_id WHERE drs.is_primary_replica = 0 ORDER BY drs.log_send_queue_size DESC;
Review the query results for high values in log_send_queue_kb and send_to_harden_lag_ms for replication lag. For more information, see Troubleshooting log send queueing in an Always On availability group on the Microsoft website.
Identify queries that cause high HADR_SYNC_COMMIT wait times
Complete the following steps:
- Open the Database Instance Dashboard on the Amazon CloudWatch console.
- Select the time frame that corresponds to the period of high wait times.
- Check the Top SQL tab to identify queries with the longest execution times and high HADR_SYNC_COMMIT wait times.
- Review the resource-intensive queries and the wait events during this period.
Check metrics for your DB instance in CloudWatch
To identify resource bottlenecks, use the CloudWatch console to check the following metrics for your DB instance:
- Confirm that the ReadIOPS and WriteIOPS metrics don't exceed the allocated IOPS for the DB instance.
- Confirm that the ReadThroughput and WriteThroughput metrics don't exceed the allocated throughput for the DB instance.
- Check whether the CPUUtilization metric is consistently high during periods of high HADR_SYNC_COMMIT wait times.
- Check whether the FreeableMemory metric is low. Low memory causes an increase in disk I/O operations when the buffer pool can't effectively cache data.
- Check whether the DiskQueueDepth metric is high. High DiskQueueDepth values show that I/O requests accumulate faster than the storage subsystem can process them.
Use Enhanced Monitoring to identify resource bottlenecks
To determine whether read or write IOPS exceed the allocated IOPS and throughput or the instance throttles I/O operations, monitor OS metrics.
Complete the following steps:
- Open the Amazon Aurora and RDS console.
- In the navigation pane, choose Databases, and then select the database that you want to monitor.
- Choose the Monitoring tab.
- Choose the Monitoring menu, and then choose Enhanced Monitoring.
- On the Monitoring menu, choose Manage graphs.
- Select readIOsPS, writeIOsPS, readKb, and writeKb disk I/O metrics.
- Choose the corresponding settings icon to switch between the primary and secondary instance. Then, review the disk I/O metrics for both instances.
Reduce transaction log volume
To reduce the volume of transaction log records that are generated and the associated HADR_SYNC_COMMIT wait times, take the following actions:
- To reduce the volume of transaction log records that are generated during modification operations, remove unnecessary indexes.
- Schedule index rebuild and reorganization operations during low-traffic periods. Use index reorganization instead of full rebuilds when possible. Reorganization generates fewer log records than full rebuilds.
- Reduce the maximum degree of parallelism (MAXDOP) for index maintenance operations. A lower MAXDOP limits parallelism and reduces the volume of log data that must be written to the secondary replica. However, a lower MAXDOP also reduces the operation speed.
Increase IOPS and throughput allocation
If CloudWatch or Enhanced Monitoring metrics confirm that IOPS or throughput is throttled, then take the following actions:
- Modify your DB instance to increase the allocated IOPS and throughput.
- Check that the DB instance class supports the required IOPS and throughput levels for your workload.
Note: If workload doesn't increase with high HADR_SYNC_COMMIT wait times, then upgrade the DB instance to the latest supported minor version to improve replication performance. As an immediate workaround, reboot the DB instance without failover to restore normal operation.
Related information
SQL Server, availability replica object on the Microsoft website
Monitoring Amazon RDS metrics with Amazon CloudWatch
Monitoring OS metrics with Enhanced Monitoring
- Topics
- Database
- Language
- English

Relevant content
- asked 2 years ago
- Accepted Answerasked 2 years ago
- Accepted Answerasked 3 years ago
- asked 3 years ago