Potentially degraded RDS volume - Latency & I/O spikes

0

Similar to this thread https://forums.aws.amazon.com/thread.jspa?messageID=905866 , we are experiencing sudden spikes in read/write latency and queue depth on a production multi-az db.r3.4xlarge mysql instance in us-east-1. The spike lasts for a few hours and then returns to normal. This has been occurring every day for the past week now, around the same time of day (+- a few hours). There is no increase in connections, workloads, web layer traffic, cronjobs, etc. It's just queries start crawling which leads to a huge back up of active connections and ultimately results in timed out web requests.

We've turned on Enhanced Monitoring and see the physical device read/write IOs plummet. Physical device xvdi is the only physical device which has a huge jump in Avg Queue Size, Avg Request Size, Disk I/O Await, Disk I/O Util, Read Total, and Write Total.

We believe there to be a degradation issue with volume xvdi. This is for maindb in 939284280993. DM for more identifying info if needed.

Can someone from AWS please look into this ASAP?

Edited by: csscif on Jul 7, 2019 11:18 AM

Edited by: csscif on Jul 7, 2019 11:19 AM

Edited by: csscif on Jul 7, 2019 11:19 AM

csscif
已提问 5 年前922 查看次数
1 回答
0

Hi,
I took a look at your instance, there is no issue with the storage volumes on your instance. Rather, you have 2t gp2 allocated storage and your baseline performance in this case is 6000.
Your workload is consistently using higher IOPS than baseline.
Currently your burst balance is completely depleted, so you are getting throttled at the baseline IOPS of 6000.

Here is a blog with more info about burst verus baseline:
https://aws.amazon.com/blogs/database/understanding-burst-vs-baseline-performance-with-amazon-rds-and-gp2/

You could increase IOPS by allocating a larger gp2 volume.
In your case, because you have a legacy volume layout, the conversion to larger storage will occur online but will take about 24 hours.

Alternatively you are using a lot of READ IOPS, you might be able to tune your workload to do fewer reads.

hth,
Phil

AWS
审核人员
philaws
已回答 5 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则