The CheckpointWriteLatency metrics show an increasing trend, while other metrics remain healthy. Does this increase indicate current or potential database issues?
CheckpointWriteLatency (also written as checkpoint_write_latency) metric appears in the CloudWatch metrics after enabling Advanced mode of Database Insights.
CheckpointWriteLatency metric indicates the total amount of time that has been spent in the portion of checkpoint processing where files are written to disk. The purpose of a checkpoint is to ensure that all the dirty buffers created up to a certain point are sent to disk so that the WAL up to that point can be recycled.
This metrics is calculated using following formula:
checkpoint_write_latency = checkpoint_write_time / (checkpoints_timed + checkpoints_req)
where :
- checkpoint_write_time : indicates the total amount of time spent writing dirty buffers to disk during checkpoint operations.
- checkpoints_timed : represents the number of scheduled (routine) checkpoints that have been performed on DB
- checkpoints_req : represents the number of requested (on-demand/manual) checkpoints that have been performed on DB
Above calculation shows a direct correlation between checkpoint_write_time and checkpoint_write_latency. When checkpoint_write_time increases from X to Y, checkpoint_write_latency increases by Z milliseconds.
checkpoint_write_time
represents the cumulative time across all checkpoints since the last reset of PostgreSQL statistics as a result of which the checkpoint_write_latency
is also cumulative totals, which explains the gradual increase.
**Where does this checkpoint_write_time comes from ? **
checkpoint_write_time is derived from pg_stat_bgwriter view (for PG v16 and below) and pg_stat_checkpointer view (for PG 17 and above). You can check this view in your RDS Postgres and Aurora Postgres Databases using :
For PG16 and below:
postgres=> select * from pg_stat_bgwriter;
checkpoints_timed | checkpoints_req | checkpoint_write_time | checkpoint_sync_time | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_backend_fsync | buffers_alloc |
stats_reset
-------------------+-----------------+-----------------------+----------------------+--------------------+---------------+------------------+-----------------+-----------------------+---------------+-------
------------------------
18435 | 3 | 5151401 | 138161 | 48969 | 0 | 0 | 16 | 0 | 125835 | 2025-0
4-29 10:06:00.923886+00
(1 row)
For PG17 and above:
postgres=> select * from pg_stat_checkpointer;
num_timed | num_requested | restartpoints_timed | restartpoints_req | restartpoints_done | write_time | sync_time | buffers_written | stats_reset
-----------+---------------+---------------------+-------------------+--------------------+------------+-----------+-----------------+-------------------------------
3 | 3 | 0 | 0 | 0 | 46218 | 15 | 1428 | 2025-02-18 02:26:20.375139+00
(1 row)
As more and more writes arrive on the database, the checkpoint_write_time/write_time continues to increase and so does the CheckpointWriteLatency CloudWatch metric.
To bring CheckpointWriteLatency down, we can reset the checkpointer statistics:
For PG16 and below:
postgres=> SELECT pg_stat_reset_shared('bgwriter');
pg_stat_reset_shared
----------------------
(1 row)
postgres=> select * from pg_stat_bgwriter;
checkpoints_timed | checkpoints_req | checkpoint_write_time | checkpoint_sync_time | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_backend_fsync | buffers_alloc |
stats_reset
-------------------+-----------------+-----------------------+----------------------+--------------------+---------------+------------------+-----------------+-----------------------+---------------+-------
------------------------
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 33 | 2025-0
7-02 11:01:29.007252+00
(1 row)
For PG17 and above:
postgres=> SELECT pg_stat_reset_shared('checkpointer');
pg_stat_reset_shared
----------------------
(1 row)
postgres=> select * from pg_stat_checkpointer;
num_timed | num_requested | restartpoints_timed | restartpoints_req | restartpoints_done | write_time | sync_time | buffers_written | stats_reset
-----------+---------------+---------------------+-------------------+--------------------+------------+-----------+-----------------+-------------------------------
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2025-07-05 03:54:58.363981+00
(1 row)
NOTE : pg_stat_reset_shared ( [ target text ] ) resets cluster-wide statistics counters to zero for the defined target (no other view gets affected), depending on the argument. target can be bgwriter, checkpointer, io, etc. For complete list, refer pg_stat_reset_shared :
- bgwriter: Reset all the counters shown in the pg_stat_bgwriter view.
- checkpointer: Reset all the counters shown in the pg_stat_checkpointer view.
After resetting the statistics, wait for a few minutes (preferably 5 minutes of more) for CloudWatch to populate the incoming changes. You will observe that the CheckpointWriteLatency has come down to 0 which will again start increasing as the workload flows on the database.
Does CheckpointWriteLatency indicates any issues for my database ?
No, checkpoint write latency does not cause any performance impact on your database. It is an ever increasing number and only shows the cluster-wide statistics for bgwriter/checkpointer. No action is needed from your side.
To know how to turn on the Advanced mode of Database Insights please refer Turning on the Advanced mode of Database Insights for Amazon RDS and Turning on the Advanced mode of Database Insights for Amazon Aurora