- Le plus récent
- Le plus de votes
- La plupart des commentaires
I finally got some news from AWS support. They recently applied a patch to RDS instances that seems to be causing the issue, as the problem started after they applied the patch. They also say:
"It appears that this is a known issue that is currently occurring with RDS PostgreSQL for some instance classes.Unfortunately I do not have a specific list of impacted instances as it seems to be an internal issue and I cannot provide you with an ETA for a fix, but I can confirm that the internal team is actively working on this issue and will deploy a fix as soon as possible."
That's absolutely wonderful news!
Thanks for doing this and posting. We're hanging on by a thread here in a couple environments.
JuanM, we seem to be seeing relief on our end. Wondered if you were seeing the same?
It seems that someone else is having the same issue. We're reluctant to pay AWS Support fees (expensive) which seems to be an AWS issue and on AWS's side of the Shared Responsibility model.
We are experiencing the same issue here in all our RDS Postgresql databases (6 instances) since 3 days ago.
We have spikes of ReadIOPS every 15 minutes. Other metrics such as ReadLatency and DiskQueueDepth are affected as well. Taking a look at CPU usage it seems it is also affected during ReadIOPS spikes.
I tried rebooting the instance to the secondary zone (multi a-z) but it didn't solve the problem.
I had to increase the storage just to improve recovering BurstBalance in order to avoid an outage due to exhausting the credit.
I reported the issue to AWS support but I still don't have an answer.
We ended up finding that if we moved our database from a db.t3.medium to a db.t3.large the impact of whatever was going on was reduced. Additionally, moving it to a db.m6g.large eliminated the effects completely. The effects came back when we moved the instance back to a db.t3.medium, which we feel is best suited for the read-replica (and had been serving us just fine for years).
can you guys check yours?, I think mine doesn't have it anymore around 5 hours ago. nvm it came back after I change my instance size back
@fikrimi, we still have the issue here. @ssmith, my guess about why moving to db.t3.large reduced the impact is that you are doubling the memory size (from 4GiB to 8GiB) so the workload is more likely to be completely in memory now. Whatever the thing is that was reading data from disk every 15 minutes is now reading it from memory, without impacting the ReadIOPS, but I'm pretty sure that If your instance upgrade only reduced the problem, the issue is still there.
yep, mine doesn't get read anymore when I use t3.xlarge when I try to upgrade to postgres 11, I left it overnight, and I don't see the read spike anymore, so I thought it has been fixed, then I scale it down to its original (t3.small) it then came back, at least on medium it doesn't chug all the burst .. sigh ..
@JuanM I'd agree that the memory increase only hid the situation. I'm just wanting to know why it happened all of the sudden. No ramp up in the previous days, just one day we woke up and the load was there.
This needs to be looked at by AWS support for your instance and underlying EBS disks but from what you describe, it seems like it may from AWS end if its happening at exact interval of 15 min without ANY of your workload. If you have not already, please open case with AWS support and they should be able to troubleshoot this for you and diagnose what may be happening at EBS disks that is used by your RDS read replica.
Contenus pertinents
- demandé il y a 4 mois
- AWS OFFICIELA mis à jour il y a 2 ans
- AWS OFFICIELA mis à jour il y a un an
Same case here, https://thumbs2.imgbox.com/ea/69/BCVUsMyC_t.png.
started after migration from postgres 9.3 to postgres 13.
resolves if you have twice as much memory available, but the cost of that