- Newest
- Most votes
- Most comments
Based on your description of sudden IO wait spikes in your RDS PostgreSQL instance, this sounds like an I/O bottleneck issue. Here are some likely causes and solutions:
Possible causes:
-
Connection spikes: A sudden increase in database connections can overwhelm your instance, causing I/O waits. Check your CloudWatch metrics for spikes in the
DatabaseConnectionsmetric. -
Storage throughput limits: Your RDS instance may be hitting the IOPS limits of your storage volume. If you're using gp2 storage, you might be reaching its burstable IOPS limit during peak load.
-
Sequential scans: Your queries might be performing sequential scans on large tables, which can cause
IO:DataFileReadwait events when the required pages aren't available in shared memory. -
Buffer pool size: If your shared buffer pool is too small for your workload, it will cause excessive disk reads.
-
Background processes: Multiple vacuum workers running simultaneously or maintenance operations can cause I/O contention.
Recommended actions:
-
Check wait events: Look for
IO:DataFileReadevents, which occur when a connection waits for a backend process to read a page from storage because it's not in shared memory. -
Monitor storage metrics: Review CloudWatch metrics like DiskQueueDepth, ReadLatency, WriteLatency, ReadIOPS, WriteIOPS, ReadThroughput, and WriteThroughput to confirm I/O bottlenecks.
-
Upgrade storage: If you're using gp2 storage, consider upgrading to gp3, which allows scaling IOPS independently from storage and has a higher baseline IOPS.
-
Scale up your instance: A higher-capacity instance class provides more memory for the shared buffer pool and more vCPUs to handle connections.
-
Optimize for read workloads: If applicable, take advantage of reader nodes or read replicas to distribute read operations and reduce I/O pressure on the writer node.
-
Check for table bloat: Tables consuming unnecessary space can cause performance issues. Use the pgstattuple extension to identify tables with bloat.
-
Review autovacuum settings: Ensure autovacuum is properly configured to prevent table bloat.
The fact that rebooting temporarily helps suggests that this could be related to memory management or buffer cache issues that are temporarily resolved by clearing memory, but then quickly recur as the system operates.
Sources
Capture and diagnose I/O bottlenecks on Amazon RDS for SQL Server | AWS Database Blog
IO:DataFileRead - Amazon Aurora
RDS for PostgreSQL wait events - Amazon Relational Database Service
IO:DataFileRead - Amazon Relational Database Service
RDS for PostgreSQL wait events - Amazon Relational Database Service
answered 10 months ago
Relevant content
asked a year ago

2nd dev here.
We’ve been running a stable RDS PostgreSQL instance (db.t3.medium) for quite a while without issues. Starting yesterday, we noticed a sudden and sustained slowdown in query performance across our application, without any major changes on our side.
Key observations so far:
Table Aat ~35%,Table Bat ~12%). This suggests far more data is being read from storage instead of memory.So far we’ve:
Given these findings, we are concerned this may be related to the underlying EBS volumes or networking between RDS and EBS, which is outside of our control.
Question: Can AWS confirm whether there are any known issues with the underlying EBS or networking in our region/availability zone, or if our instance may be experiencing degraded I/O capacity?
Can't even download the error log files. Just downloads a 0 byte file.