How do I troubleshoot a slowly running SELECT statement in my Aurora MySQL-Compatible DB cluster?
I want to troubleshoot a slow running SELECT statement in my Amazon Aurora MySQL-Compatible Edition DB cluster.
Short description
Your SELECT statement might run slowly on your Aurora MySQL-Compatible DB cluster for the following reasons:
- You're overusing your database system resources.
- The database is locking.
- The SELECT statement does full table scans on large tables. Or, the query doesn't have the necessary indexes.
- Long-running transactions are increasing your InnoDB history list length (HLL).
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Use CloudWatch Database Insights to check why a SELECT statement is slow
Turn on Performance Insights and use Database Insights to detect queries that cause high DB load. Monitor dimensions such as SQL, Users, and Waits in the Database load chart. Also monitor the Slow SQL queries section of the Database telemetry tab. Additionally, you can analyze query execution from the SQL statistics of Performance Insights. For example, if the number of rows that you examined per call is higher than normal, then the execution plan is inefficient.
Use metrics to monitor your DB instance systems resources
High CPU, low memory, or large workloads that exceed the capabilities of your DB instance class might cause your SELECT statement to run slowly. To monitor your DB instance resources, use the following tools:
- Use Amazon CloudWatch metrics for Amazon Aurora to monitor your CPU utilization.
- Use Enhanced Monitoring to view details of operating system (OS) metrics.
- Use OS processes data in the Database telemetry tab to check if your DB load exceeds the maximum vCPU.
Disk seeks might cause the SELECT statement to run slowly. To minimize the disk I/O, the database engine caches the block read from the disk. When the database needs the same data, it fetches the data from the memory instead of the disk. To determine whether you serve a specific query from disk or memory, use the following metrics:
- Check the ReadIOPS metric to view the number of disk I/O operations. It's a best practice to keep this value as low as possible.
- Check the BufferCacheHitRatio metric to view the percentage of requests that the buffer cache serves. It's a best practice to keep this value as high as possible.
- Check the FreeableMemory metric to view the available memory of the DB instance. It's a best practice to keep this value stable. Insufficient available memory can cause low BufferCacheHitRatio and high ReadIOPS.
Note: If the BufferCacheHitRatio metric drops, and the SELECT statement is slow, then the engine processes the queries from underlying volumes.
Disk seeks of local storage can also cause the SELECT statement to run slowly. Aurora MySQL-Compatible uses local storage for manual and internal temporary tables. For more information, see New temporary table behavior in Aurora MySQL version 3. To monitor and resolve disk seeks of local storage, check the Rdstemp OS metrics of Database Insights. It's a best practice to keep these values as low as possible.
Network saturation might cause SELECT statements to run slowly. Aurora executes I/O operations to the cluster volume through the network and sends the result of the query to the client through the network. To monitor and resolve network saturation, check the NetworkThroughput and StorageNetworkThroughput metric. Total network throughput must be less than the network bandwidth of your DB instance.
If any resource usage exceeds the capabilities of your DB instance type because of your workload, then upgrade the DB instance class.
Identify deadlocks and blocking locks
A deadlock occurs when two or more transactions can't continue because they block each other. To identify deadlocks on your database, turn on the innodb_print_all_deadlocks parameter in your parameter groups. For more information, see innodb_print_all_deadlocks on the MySQL website. Then monitor the mysql-error.log from the Aurora and RDS console, AWS Command Line Interface, or API.
(Optional) To identify deadlocks, log in to a MySQL admin account, and then run the following command:
SHOW ENGINE INNODB STATUS\G;
Note: In the expected output of the MySQL Workbench, check the Latest Detected Deadlock section.
Even if there's no deadlock, one long transaction that keeps locks might be blocking locks. To identify ongoing blocking locks, see Why was a query to my Amazon RDS for MySQL DB instance blocked when there's no other active session?
Check whether your query uses an index
When a query doesn't have an index or does full table scans, the query runs slowly. Indexes allow SELECT statements to run faster. To check if your query uses an index, use the EXPLAIN statement. For more information, see EXPLAIN statement on the MySQL website.
In the EXPLAIN output, check the table names, the key, and number of rows that the engine scans during the query. If the output doesn't show any keys in use, then create an index on the columns in the WHERE clause. If the table has the required indexing, then check whether the statistics table is up to date. Use the ANALYZE clause to update the statistics. For more information, see The INFORMATION_SCHEMA STATISTICS table on the MySQL website.
To identify slow SELECT statements, use the slow_query_log. To record slow queries, turn on slow query logging for your DB cluster.
Check the HLL
InnoDB uses Multi-Version Concurrency Control (MVCC). MVCC maintains multiple copies of the same record to maintain read consistency. The HLL is the total number of undo logs that contain modifications in the history list. When there's a long-running transaction that writes or reads data, the HLL increases until the transaction completes or rolls back. Other transactions modify the tables that the long-running transaction uses. If your workload requires multiple open or long-running transactions, then your database might have a high HLL. Also, if purge threads can't keep up with DB changes, then you might have a high HLL. A high HLL causes higher resource use, slow and inconsistent SELECT statement performance, and an increase in storage. In extreme cases, a high HLL might cause a database outage.
To monitor your HLL, use the RollbackSegmentHistoryListLength metric of writer instance. Or, run the following command:
SHOW ENGINE INNODB STATUS;
Example output:
------------ TRANSACTIONS ------------Trx id counter 26368570695 Purge done for trx's n:o < 26168770192 undo n:o < 0 state: running but idle History list length 1839
For Aurora MySQL-Compatible, the HLL is at the cluster level. To check the HLL at the cluster level, connect to your writer instance and run the following statement:
SELECT server_id, IF(session_id = 'master_session_id', 'writer', 'reader') AS ROLE, replica_lag_in_msec, oldest_read_view_trx_id, oldest_read_view_lsn from mysql.ro_replica_status;
Note: You can use the previous statement to identify replica lag between the reader and writer nodes. Check the oldest log sequence number (LSN) that the DB uses to read from storage and the DB instance's oldest read view Transaction ID (Trx ID). Make sure that one of the instances contains an old read view.
To connect to an instance that contains an old read view, run the following statement:
SELECT a.trx_id, a.trx_state, a.trx_started, TIMESTAMPDIFF(SECOND,a.trx_started, now()) as "Seconds Transaction Has Been Open", a.trx_rows_modified, b.USER, b.host, b.db, b.command, b.time, b.state from information_schema.innodb_trx a, information_schema.processlist b where a.trx_mysql_thread_id=b.id order by trx_started;
Note: Use the previous statement to identify the session or transaction that has the oldest trx_id. To unblock the purge operation, determine whether you can end the session.
To resolve high HLL, take the following actions:
- If DML writes cause an increase in HLL, then roll back the transaction to cancel the query. This process is lengthy because of the number of updates that you must roll back.
- If a READ causes an increase in HLL, then use mysql.rds_kill_query to cancel the query.
Note: Contact your database administrator to check whether you can cancel a query.
To prevent high HLL, it's a best practice to commit the data in smaller batches. Also, don't reboot the DB cluster or instance. Purge the HLL when it can access the memory data in the buffer pool. If you reboot the database, then the survivable page cache might be lost. If the survivable page cache is lost, then data pages from the cluster volume must be read to purge the HLL. This process is slower than a purge from memory and results in additional I/O billing costs.
Related information
Monitor Amazon Aurora MySQL, Amazon RDS for MySQL and MariaDB logs with Amazon CloudWatch
- Topics
- Database
- Tags
- Aurora MySQL
- Language
- English
Related videos


Relevant content
- asked 2 years ago
- Accepted Answerasked 3 years ago
- Accepted Answerasked a year ago
AWS OFFICIALUpdated 9 months ago