Potentially degraded RDS volume - Latency & I/O spikes

0

Similar to this thread https://forums.aws.amazon.com/thread.jspa?messageID=905866 , we are experiencing sudden spikes in read/write latency and queue depth on a production multi-az db.r3.4xlarge mysql instance in us-east-1. The spike lasts for a few hours and then returns to normal. This has been occurring every day for the past week now, around the same time of day (+- a few hours). There is no increase in connections, workloads, web layer traffic, cronjobs, etc. It's just queries start crawling which leads to a huge back up of active connections and ultimately results in timed out web requests.

We've turned on Enhanced Monitoring and see the physical device read/write IOs plummet. Physical device xvdi is the only physical device which has a huge jump in Avg Queue Size, Avg Request Size, Disk I/O Await, Disk I/O Util, Read Total, and Write Total.

We believe there to be a degradation issue with volume xvdi. This is for maindb in 939284280993. DM for more identifying info if needed.

Can someone from AWS please look into this ASAP?

Edited by: csscif on Jul 7, 2019 11:18 AM

Edited by: csscif on Jul 7, 2019 11:19 AM

Edited by: csscif on Jul 7, 2019 11:19 AM

csscif
asked 3 years ago204 views
1 Answer
0

Hi,
I took a look at your instance, there is no issue with the storage volumes on your instance. Rather, you have 2t gp2 allocated storage and your baseline performance in this case is 6000.
Your workload is consistently using higher IOPS than baseline.
Currently your burst balance is completely depleted, so you are getting throttled at the baseline IOPS of 6000.

Here is a blog with more info about burst verus baseline:
https://aws.amazon.com/blogs/database/understanding-burst-vs-baseline-performance-with-amazon-rds-and-gp2/

You could increase IOPS by allocating a larger gp2 volume.
In your case, because you have a legacy volume layout, the conversion to larger storage will occur online but will take about 24 hours.

Alternatively you are using a lot of READ IOPS, you might be able to tune your workload to do fewer reads.

hth,
Phil

philaws
answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions