How do I troubleshoot a slowly running SELECT query in my Amazon Aurora MySQL DB cluster?
I want to troubleshoot a SELECT query that runs slowly in my Amazon Aurora MySQL-Compatible Edition DB cluster.
Short description
Your SELECT query might run slowly on your Amazon Aurora MySQL DB cluster for the following reasons:
- You're overusing your database system resources.
- The database is locking.
- The SELECT query does full table scans on large tables. Or, the query doesn't have the necessary indexes.
- Your InnoDB history list length (HLL) increased because of long-running transactions.
Resolution
To troubleshoot a slowly running SELECT query, take the following actions.
Use metrics to monitor your Amazon RDS systems resources
High CPU, low memory, or a workload that exceeds your DB instance type's capabilities might cause your SELECT query to run slowly. To determine how and where your CPU is used, use the following tools:
- To monitor your CPU utilization, use Amazon CloudWatch metrics for Amazon Aurora.
- To view details of operating system (OS) metrics, use Enhancing Monitoring.
- To check if your DB load exceeds the maximum vCPU, use Performance Insights. You can also monitor your load bearing queries and SQLs by waits, and identify the users that cause the maximum waits.
SELECT queries might run slowly because of disk seeks. To minimize the disk I/O, the database engine caches the block read from the disk. When the database needs the same data, the data is fetched from the memory instead of the disk. To determine whether you serve a specific query from disk or memory, use the following metrics:
- Check the VolumeReadsIOPS metric to view the number of billed volume level (disk) read operations. It's a best practice to keep this value is as low as possible.
- Check the BufferCacheHitRatio metric to view the percentage of requests that the buffer cache serves. It's a best practice to keep this value is as high as possible.
Note: If BufferCacheHitRatio drops and your SELECT query is slow, then your queries are processed from underlying volumes.
To identify slow SELECT statements, use the slow_query_log. To record slow queries, turn on slow query logging for your DB cluster.
Identify deadlocks and wait events
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.
A deadlock occurs when two or more transactions can't continue because they block each other. To identify deadlocks on your database, turn on the innodb_print_all_deadlocks parameter in your parameter groups. For more information, see innodb_print_all_deadlocks on the MySQL website. Then, monitor the mysql-error.log from the Amazon RDS console, AWS Command Line Interface, or API.
(Optional) To identify deadlocks, log in to a MySQL admin account, and then run the following command:
SHOW ENGINE INNODB STATUS\G;
Note: In the command's output, check the Latest Detected Deadlock section.
Check whether your query uses an index
If a query doesn't have an index or does full table scans, then the query runs slowly. Indexes allow SELECT queries to run faster.
To check if your query uses an index, use the EXPLAIN query. For more information, see EXPLAIN statement on the MySQL website. In the EXPLAIN output, check the table names, the key that's used, and number of rows that are scanned during the query. If the output doesn't show any keys in use, then create an index on the columns in the WHERE clause. If the table has the indexing required, then check whether the statistics table is up to date. For more information, see INFORMATION_SCHEMA STATISTICS table on the MySQL website.
Check the history list length (HLL)
InnoDB uses multi-version concurrency control (MVCC). MVCC maintains multiple copies of the same record to maintain read consistency. The HLL is the total number of undo logs that contain modifications in the history list. When there's a long-running transaction that writes or reads data, the HLL increases until the transaction completes or rolls back. Other transactions modify the tables that the long-running transaction uses. If your workload requires multiple open or long-running transactions, then your database might have a high HLL. Also, if purge threads can't keep up with DB changes, then you might have a high HLL. A high HLL causes higher resource use, slow and inconsistent SELECT statement performance, and an increase in storage. In extreme cases, a high HLL might cause a database outage.
To check the history list length, run the following command:
SHOW ENGINE INNODB STATUS;
Example output:
------------ TRANSACTIONS ------------ Trx id counter 26368570695 Purge done for trx's n:o < 26168770192 undo n:o < 0 state: running but idle History list length 1839
For Amazon Aurora MySQL, the HLL is at the cluster level. To check the HLL at the cluster level, connect to your writer and run the following query:
SELECT server_id, IF(session_id = 'master_session_id', 'writer', 'reader') AS ROLE, replica_lag_in_msec, oldest_read_view_trx_id , oldest_read_view_lsn from mysql.ro_replica_status;
Note: You can use the previous query to identify replica lag between the reader and writer nodes. Check the oldest LSN that the DB uses to read from storage and the DB instance's oldest read view TRX ID. Make sure that you check whether one of the instances contains an old read view.
To connect to an instance that contains an old read view, run the following query:
SELECT a.trx_id, a.trx_state, a.trx_started, TIMESTAMPDIFF(SECOND,a.trx_started, now()) as "Seconds Transaction Has Been Open", a.trx_rows_modified, b.USER, b.host, b.db, b.command, b.time, b.state from information_schema.innodb_trx a, information_schema.processlist b where a.trx_mysql_thread_id=b.id order by trx_started;
Note: Use the previous query to identify the session or transaction that has the oldest TRX_ID. To unblock the purge operation, determine whether you can end the session.
To monitor your HLL, use the RollbackSegmentHistoryListLength CloudWatch metric. For older Amazon Aurora MySQL versions, use trx_rseg_history_len:
select NAME AS RollbackSegmentHistoryListLength, COUNT from INFORMATION_SCHEMA.INNODB_METRICS where NAME = 'trx_rseg_history_len';
If Performance Insights is turned on for your Amazon Aurora MySQL instances, then complete the following steps to check the RollbackSegmentHistoryListLength:
- Open the Amazon RDS console.
- In the left navigation pane, choose Performance Insights.
- Select Manage metrics, and then choose Database metrics.
- Choose the trx_rseg_history_len metric, and then choose Update graph.
To resolve high HLL, take the following actions:
- If DML (writes) cause an increase in HLL, then roll back the transaction to cancel the statement. This process is lengthy because of the number of updates that need to be rolled back.
- If a READ causes an increase in HLL, then use mysql.rds_kill_query to cancel the query.
Note: Contact your database administrator to check whether you can cancel a query.
Note: To prevent high HLL, it's a best practice to commit the data in smaller batches. Also, don't reboot the DB cluster or instance. Purge the HLL when it can access the memory data in the buffer pool. If you reboot the database, then the survivable page cache might be lost. If the survivable page cache is lost, then data pages from the cluster volume must be read to purge the HLL. This process is slower than a purge from memory and results in additional I/O billing costs.
Related information
Monitor Amazon Aurora MySQL, Amazon RDS for MySQL and MariaDB logs with Amazon CloudWatch

Relevant content
- asked 2 years agolg...
- Accepted Answerasked 2 months agolg...
- asked 2 years agolg...
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated a month ago