PostgreSQL Read Replica Shows High Read IO Every 15 Minutes
We have an RDS PostgreSQL read replica which every 15 minutes exactly spikes to 3000 IOPS for 1-2 minutes. It mysteriously started overnight yesterday with no previous indication. There does not seem to be any corresponding queries or jobs touching the primary database nor the read replica which would be causing this. Manual backups happen once nightly. Replication does not show any current lag. This condition does tend to run us out of EBS burst credits eventually. Rebuilding the read replica results in the same condition.
We're at a loss as to what we should look at in order to ascertain what's causing the issue. pg_stat_activity
doesn't show any queries running at that time.
I finally got some news from AWS support. They recently applied a patch to RDS instances that seems to be causing the issue, as the problem started after they applied the patch. They also say:
"It appears that this is a known issue that is currently occurring with RDS PostgreSQL for some instance classes.Unfortunately I do not have a specific list of impacted instances as it seems to be an internal issue and I cannot provide you with an ETA for a fix, but I can confirm that the internal team is actively working on this issue and will deploy a fix as soon as possible."
That's absolutely wonderful news!
Thanks for doing this and posting. We're hanging on by a thread here in a couple environments.
JuanM, we seem to be seeing relief on our end. Wondered if you were seeing the same?
It seems that someone else is having the same issue. We're reluctant to pay AWS Support fees (expensive) which seems to be an AWS issue and on AWS's side of the Shared Responsibility model.
We are experiencing the same issue here in all our RDS Postgresql databases (6 instances) since 3 days ago.
We have spikes of ReadIOPS every 15 minutes. Other metrics such as ReadLatency and DiskQueueDepth are affected as well. Taking a look at CPU usage it seems it is also affected during ReadIOPS spikes.
I tried rebooting the instance to the secondary zone (multi a-z) but it didn't solve the problem.
I had to increase the storage just to improve recovering BurstBalance in order to avoid an outage due to exhausting the credit.
I reported the issue to AWS support but I still don't have an answer.
We ended up finding that if we moved our database from a db.t3.medium to a db.t3.large the impact of whatever was going on was reduced. Additionally, moving it to a db.m6g.large eliminated the effects completely. The effects came back when we moved the instance back to a db.t3.medium, which we feel is best suited for the read-replica (and had been serving us just fine for years).
can you guys check yours?, I think mine doesn't have it anymore around 5 hours ago. nvm it came back after I change my instance size back
@fikrimi, we still have the issue here. @ssmith, my guess about why moving to db.t3.large reduced the impact is that you are doubling the memory size (from 4GiB to 8GiB) so the workload is more likely to be completely in memory now. Whatever the thing is that was reading data from disk every 15 minutes is now reading it from memory, without impacting the ReadIOPS, but I'm pretty sure that If your instance upgrade only reduced the problem, the issue is still there.
yep, mine doesn't get read anymore when I use t3.xlarge when I try to upgrade to postgres 11, I left it overnight, and I don't see the read spike anymore, so I thought it has been fixed, then I scale it down to its original (t3.small) it then came back, at least on medium it doesn't chug all the burst .. sigh ..
@JuanM I'd agree that the memory increase only hid the situation. I'm just wanting to know why it happened all of the sudden. No ramp up in the previous days, just one day we woke up and the load was there.
This needs to be looked at by AWS support for your instance and underlying EBS disks but from what you describe, it seems like it may from AWS end if its happening at exact interval of 15 min without ANY of your workload. If you have not already, please open case with AWS support and they should be able to troubleshoot this for you and diagnose what may be happening at EBS disks that is used by your RDS read replica.
Relevant questions
Aurora PostgreSQL Replica Pricing
asked 2 years agoFortigate instance reboots every few minutes
asked 2 years agoRDS Postgresql migrate to Aurora Postgresql. Not showing "Aurora read replica" & "Migrate snapshot" options.
Accepted Answerasked 5 months agoCan DMS be used to extract data from an RDS Oracle Read Replica?
Accepted Answerasked 2 years agoRDS - Read/Write Replica Loadbalancing
asked 17 days agoPostgreSQL Read Replica Shows High Read IO Every 15 Minutes
asked 4 months agoRDS Read Replica Lag
asked 4 days agoRDS Read IOPS, Read Throughput, Queue Depth sudden increase
asked 4 months agoMove RDS postgresql database to Aurora Serverless
Accepted Answerasked 3 years agoAurora Read Replica Stuck in Creating State
asked 3 years ago
Same case here, https://thumbs2.imgbox.com/ea/69/BCVUsMyC_t.png.
started after migration from postgres 9.3 to postgres 13.
resolves if you have twice as much memory available, but the cost of that