RDS Write IOPS quadrupled after database restart

0

Hello,

our setup:
RDS db.m4.large, MySQL

Two weeks ago we had to restart our database (2018-04-07, 19:30 UTC). After the restart our Write IOPS increased drastically without any other changes: we don’t have more traffic, we don’t have more write queries. The only thing we did was exactly this restart.

As you can also see in this screenshot from CloudWatch (https://my.hidrive.com/lnk/vPJFq34A, blue is the master, green is the read replica): before the restart, our read replica always had the same levels of Write IOPS. After the restart, the Write IOPS on the master database are 2 - 4 times as high as on the read replica. This makes absolutely no sense to us and we need help urgently.

We’ve already looked into all other RDS metrics and nothing else has changed (not Queue Depth, not Read IOPS, not Write Throughput).
The only thing we see is, that the Read Latency increased from 0,5 ms to 0,8 ms.

Also very interesting (screenshots https://my.hidrive.com/lnk/VpJFKWvM, https://my.hidrive.com/lnk/cEpFKJJ1): when we look at the „Enhanced Monitoring“ metrics, the Write IO/s always correlated with the data in CloudWatch before the restart. And now, after the restart, the metrics in Enhanced Montoring are still on the same level like before the restart and do not correlate with the CloudWatch statistics anymore.

We’ve already tried to restart our database once again with no effect.

Questions:

  • Why did our Write IOPS explode after the restart?
  • Why are our Write IOPS not at the same level like the read replica like before the restart?
  • Why do the Write IOPS not correlate with the Write IO/s in Enhanced Monitoring.

Thank you very much for your help.

Edited by: r123 on Apr 20, 2018 7:30 AM

r123
asked 6 years ago419 views
3 Answers
0

Did the performance change? Or just the iops have increased?
The first thing to look at is if the throughput changed or not? If the throughput has not changed, it is also likely that the average io size is smaller.

Please send me the instance identifier and the region.
thanks,

AWS
MODERATOR
philaws
answered 6 years ago
0

Hello Phil,

thank you for your answer.

Regarding your questions:
_ The Performance did not change.
_ The throughput also hasn't changed.

What do you mean by average IO Size? Why would this change with a database restart and why should the average IO size be different on the master database compared to the read replica database?

I've sent you the instance identifier and region in a private message.

Best regards.

Edited by: r123 on Apr 26, 2018 9:03 AM

r123
answered 6 years ago
0

Hi r123,

I experienced a similar issue after modifying my database instance from db.r3 to db.r4 instance type, including an identical discrepancy between the CloudWatch and the Enhanced Monitoring metric. I followed up with AWS Support and received confirmation that there is a known issue affecting the AWS RDS WriteIOPS metric reported in CloudWatch of db.m4 and db.r4 instance types. No ETA on a fix.

answered 6 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions