RDS for MySQL instance stuck on %nice CPU peak usage

0

I can see "Other 1075 mysqld threads" using 74% CPU of the RDS MySQL instance (v8.0.33) under "Operating system process list" in "Monitoring" tab and also in Performance Insights Metrics Dashboard under "CPU utilization (Percent)": os.cpuUtilization.nice.avg = 100%

I could not connect to the instance (got timeout errors) to try to troubleshoot while the instance was at that state and I had to reboot it.

This happens every week.

1 Answer
0

Hello There,

I understand that you are observing 74% CPU Utilization under OS process list Enhanced Monitoring tab along with performance insights dashboard showing “CPU utilization (Percent)": os.cpuUtilization.nice.avg = 100% which is happening on a weekly basis and you need some insights regarding the same.

To begin with, as you correctly noticed from Enhanced monitoring tab you can check the OS process list that is associated with the concerned DB instance. Enhanced monitoring shows a maximum of 100 processes. Here, it helps you to identify which processes have the largest impact on performance based on CPU and memory use.

From your initial investigation, We can observe that there are multiple mysqld threads consuming lot of CPU resources which was observed via the OS process list monitoring tab and this Enhanced Monitoring is usually used alongside Performance Insights in order to identify the nature of the workload and the exact queries responsible for it (by checking the SQL tab).

From the investigation, you have also noticed “os.cpuUtilization.nice.avg” metric reaching upto 100% Utlliization. Here, As you have observed out of total 100% CPU utilisation the maximum CPU distribution was for the nice%, higher nice% tells about the higher load on the Database instance due to various running processes inside the database. Nice% gives priority to RDS MySQL process so that it gets more CPU cycle so it completes the process quickly.

That being said, From Monitoring OS processes and Performance insights dashboard it is clear that there is certain database load on the CPU of the instance which needs to be addressed to reduce the overall CPU consumption.

——————————

Recommendations

——————————

As you are already aware, Performance Insights samples the state of connected sessions in your DB instance every second. If a session is spending time on a database-related operation, Performance Insights records the current time, the type of operation (I/O, CPU, locking, etc.), the current SQL statement, and several other session attributes.

Here, I would highly advice you to track and note down the queries which were causing the high load on the CPU. From the Performance insights, If the Database load chart shows a bottleneck, you can find out where the load is coming from. To do so, look at the top load items table below the Database load chart. Choose a particular item, like a SQL query or a user, to drill down into that item and see details about it.

You can refer the below documentation for the same on how to identify the SQL query or a user.

[+] : Analyzing DB load by wait events - https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.UsingDashboard.AnalyzeDBLoad.html

After you identify the query that is increasing CPU usage, you can optimize your workload to reduce the CPU consumption.

If you see a query that's not required for your workload, you can terminate the connection (kill the query) using the following command:

CALL mysql.rds_kill(processID);

You can find the processID of the query by running the SHOW FULL PROCESSLIST command.

If you don't want to kill the query, you can optimize the query using EXPLAIN. This shows the individual steps involved in query execution. For more information, see the MySQL Documentation for Optimizing Queries with EXPLAIN: https://dev.mysql.com/doc/refman/8.0/en/using-explain.html

Enable the PROFILING to review profile details that can indicate resource usage for statements that are executed during the current session. For more information, see the MySQL Documentation for PROFILING Syntax: https://dev.mysql.com/doc/refman/8.0/en/show-profile.html

Use ANALYZE TABLE to refresh the index statistics for the tables. This can help the optimizer choose an appropriate execution plan. For more information, see the MySQL Documentation for ANALYZE TABLE Syntax: https://dev.mysql.com/doc/refman/8.0/en/show-profile.html

Finally, I have also added a handy documentation below on "How can I troubleshoot and resolve high CPU utilization on my Amazon RDS for MySQL, MariaDB, or Aurora for MySQL instances?". This will go a long way in helping you troubleshoot and resolve this issue as well.

[+] : https://aws.amazon.com/premiumsupport/knowledge-center/rds-instance-high-cpu/

That being said, since this issue might need deeper investigation you can always open a support case with AWS Support Engineering Team. Certain non-public information might be required such as your RDS instance and metric details and information regarding your database usage hence you can open a support case with AWS using the following support link and this issue can be investigated for further troubleshooting.

[+] : AWS Support team - https://console.aws.amazon.com/support/

I hope that the above information was quite helpful. Have a great day ahead!

AWS
answered 9 months ago
  • The problem is Performance Insights cannot connect to the RDS instance in that state and collect data - there's an empty gap on the "Database load" graph in that period of time. As I mentioned, new DB sessions get timeout errors, so I cannot connect and kill any processes.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions