Skip to content

How do I troubleshoot high CPU utilization for Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible instances?

7 minute read
1

I want to troubleshoot the high CPU usage in my Amazon Relational Database Service (Amazon RDS) for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition instance.

Resolution

To determine the cause of high CPU utilization, take the following actions. Then, reduce the CPU usage of your database (DB) instance.

Review DB instance metrics

To identify when your workload causes high CPU utilization, use Amazon CloudWatch to compare the WriteIOPs, ReadIOPs, ReadThroughput, and WriteThroughput metrics with the CPUUtilization metric. For Aurora PostgreSQL-Compatible, you can also compare the BufferCacheHitRatio metric. If the metric values are similarly as high as the CPUUtilization metric, then your workload might cause the high CPU usage.

Use Enhanced Monitoring

Use Enhanced Monitoring to review the operating system (OS) for your DB instance. To collect detailed data, set the Granularity property to an interval of 1, 5, 10, 15, 30, or 60 seconds.

To troubleshoot the cause of high CPU utilization, check the LoadAverageMinute OS metric. If the load average is greater than the number of vCPUs, then the instance has high utilization. If the load average is less than the number of vCPUs for the DB instance class, then CPU throttling might not cause application latency.

You can also check the OS process list for the DB instance. Enhanced Monitoring can identify a maximum of 100 processes that affect the performance of your instance. To identify the resource usage of queries, use the Enhanced Monitoring results to run a PostgreSQL pg_stat_activity query.

Use CloudWatch Database Insights

Turn on Amazon CloudWatch Database Insights to identify the query that's responsible for the DB load. Then, check the Top SQL tab on the DB load chart for the specific time that CPU utilization increases.

Check Native PostgreSQL views and catalogs

For real-time issues, turn on pg_stat_activity or pg_stat_statements to group machines, clients, and IP addresses that send the most traffic. For more information, see pg_stat_activity and pg_stat_statements on the PostgreSQL website.

Turn on pg_stat_statements

Complete the following steps:

  1. Modify the following values of your custom DB parameter group:
    Add pg_stat_statements to shared_preload_libraries.
    Set track_activity_query_size to 4096.
    Set pg_stat_statements.track to ALL.
    Set pg_stat_statements.max to 10000.
  2. Choose Apply Immediately, and then reboot the DB instance.

Select the database to monitor

Run the following query:

select current_database();

Install the extension on the current database

Run the following command:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

View queries that spend the most time in the database

Run the following query for your PostgreSQL version.

PostgreSQL versions 12 and earlier:

SELECT total_time, query FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;

PostgreSQL versions 13 and later:

SELECT total_plan_time+total_exec_time as total_time, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 10;

List queries with a lower buffer cache hit ratio

Run the following query for your PostgreSQL version.

PostgreSQL versions 12 and earlier:

SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;

PostgreSQL versions 13 and later:

SELECT query, calls, total_plan_time+total_exec_time as total_time, rows, 100.0 * shared_blks_hit / nullif(shared_blks_hit +shared_blks_read, 0) AS hit_percent FROM pg_stat_statements ORDER BY 3 DESC LIMIT 10;

Sample queries over time

Run the following query for your PostgreSQL version.

PostgreSQL versions 12 and earlier:

SELECT query, calls, total_time/calls as avg_time_ms, rows/calls as avg_rows,temp_blks_read/calls as avg_tmp_read, temp_blks_written/calls as avg_temp_written FROM pg_stat_statements WHERE calls != 0

ORDER BY total_time DESC LIMIT 10;

PostgreSQL versions 13 and later:

SELECT query,calls,  (total_plan_time+total_exec_time as total_time)/calls as avg_time_ms,   
 rows/calls as avg_rows,  
temp_blks_read/calls as avg_tmp_read,  
 temp_blks_written/calls as avg_temp_written  
FROM pg_stat_statements  
WHERE calls != 0  
ORDER BY 3 DESC LIMIT 10;

Check for idle connections in the database

If you have idle connections in the database, then the DB instance might use a lot of CPU. To resolve this issue, check for and end idle connections. For more information, see Performance impact of idle PostgreSQL connections.

To check sessions that are idle for more than 10 minutes, run the following query:

SELECT * FROM pg_stat_activity
WHERE pid <> pg_backend_pid()
AND state in ('idle', 'idle in transaction', 'idle in transaction (aborted)', 'disabled','active')
AND state_change < current_timestamp - INTERVAL '10' MINUTE
AND usename != 'rdsadmin';

To only end sessions that are in the idle state for more than 10 minute, run the following query:

SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE pid <> pg_backend_pid()
AND state in ('idle', 'idle in transaction', 'idle in transaction (aborted)', 'disabled')
AND state_change < current_timestamp - INTERVAL '10' MINUTE
AND usename != 'rdsadmin';

To end all idle connections, run one of the following queries:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE usename = 'example-username'  
AND pid <> pg_backend_pid()  
AND state in ('idle');

Note: Replace example-username with your username.

-or-

SELECT pg_terminate_backend (example-pid);

Note: Replace example-pid with your query PID.

If your application creates too many database connections, then reduce the number of connections. Or, use a connection pooler such as PgBouncer. For more information, see PgBouncer on the PgBouncer website. You can also use Amazon RDS Proxy to set up connection pools.

Check for database locks

If your database locks cause queries to accumulate and run for longer, then CPU utilization might increase on your DB instance. To troubleshoot lock-related issues, check Database Insights for wait events, such as Lock:Relation, Lock:tuple, Lock:transactionid, or other lock-related events.

To identify what blocked your query, see How do I identify what blocked a query on my Amazon RDS for PostgreSQL or Aurora PostgreSQL-Compatible DB instance?

If locking sessions aren't ACTIVE, then use Database Insights to identify locked queries. To resolve this issue, end all locked sessions.

Run the ANALYZE command

If you don't frequently run ANALYZE on tables in your database, then the queries might use more compute resources because of stale statistics in the system. For more information, see ANALYZE on the PostgreSQL website.

Autovacuum removes unused space from tables and reclaims space in your database. The autovacuum daemon also runs the ANALYZE command to regularly update table statistics when you reach the threshold of earlier versions that you configured.

To identify when autovacuum and autoanalyze last ran on the tables, run the following query:

SELECT relname, last_autovacuum, last_autoanalyze FROM pg_stat_user_tables;

To prevent performance issues after a major engine version upgrade, run the ANALYZE command to refresh the pg_statistic table for every database in your DB instance.

To prevent performance issues because of higher resource usage, run the following command without parameters to regenerate all statistics:

ANALYZE VERBOSE;

Check PostgreSQL error logs

Turn on query logging in Amazon RDS for PostgreSQL. Then, check the PostgreSQL error logs to confirm that you set your log_min_duration_statement and log_statement parameters. For more information, see Error reporting and logging on the PostgreSQL website.

Reduce CPU usage

To reduce CPU usage, take the following actions:

  • Use EXPLAIN and EXPLAIN ANALYZE to identify ways to tune query plans. For more information, see Using EXPLAIN on the PostgreSQL website.
  • If you identify locking sessions that aren't required, then use the locking session's PID to end the sessions.
  • If there's a query that you run repeatedly, then use prepared statements to lower the CPU use. For more information, see PREPARE on the PostgreSQL website.

Related information

Best practices for working with PostgreSQL

OS monitoring

Understanding autovacuum in Amazon RDS for PostgreSQL environments

A case study of tuning autovacuum in Amazon RDS for PostgreSQL

AWS OFFICIALUpdated 10 days ago