My Amazon Elastic Block Store (Amazon EBS) volume didn't reach its average throughput or average I/O operations per second (IOPS) quotas in Amazon CloudWatch. However, the volume throttled and I experienced high latency and queue length.
Resolution
By default, CloudWatch metrics collect samples at 1-minute intervals. However, I/O operations occur at a millisecond rate. When the volume experiences bursts of high IOPS or throughput for a shorter time than the collection interval, CloudWatch doesn't capture the burst. To identify whether your volume experiences performance bursts within 1 minute, take the following actions.
Use CloudWatch metrics to identify micro-bursting
Check the VolumeIOPSExceededCheck and VolumeThroughputExceededCheck metrics
The VolumeIOPSExceededCheck and VolumeThroughputExceededCheck metrics show when the IOPS or throughput exceeds your volume's provisioned performance. The metrics show the IOPS and throughput at any time within a minute and give a consolidated signal at a 1-minute granularity. If you get a 1 value for these metrics, then the workload is micro-bursting.
Check the VolumeIdleTime metric
The VolumeIdleTime metric graph shows the number of seconds that you didn't submit a read or write operation in a specified duration. If VolumeIdleTime is high, then the volume remained idle for most of the duration. If high IOPS or throughput occurred in the same duration, then the volume experiences micro-bursting.
Calculate the average throughput and average IOPS that the EBS volume receives
Use the following formula to calculate the average throughput of the EBS volume:
Estimated average throughput = ( Sum(VolumeReadBytes) + Sum(VolumeWriteBytes) ) / CEIL(Period - Sum(VolumeIdleTime))
Use the following formula to calculate the average IOPS for the EBS volume:
Estimated average IOPS = ( Sum(VolumeReadOps) + Sum(VolumeWriteOps) ) / CEIL(Period - Sum(VolumeIdleTime))
Use the CEIL function to avoid edge cases in calculation. The preceding formulas are for scenarios where the VolumeIdleTime is close to the period. CEIL makes sure that values close to zero don't change the result. For more information, see Functions supported for metric math.
If the average throughput or IOPS is greater than the maximum throughput or IOPS for the volume, then the workload experiences micro-bursting.
Use Amazon EBS detailed performance statistics to identify micro-bursting
Use detailed performance statistics for your volume. The statistics show you the number of microseconds that your workload tries to drive higher IOPS or throughput than the volume's provisioned performance quota. EBS retains statistics only for the duration of the volume's attachment to the Amazon Elastic Compute Cloud (Amazon EC2) instance. To use the statistics to check the root cause of micro-bursting, EBS must have already captured the data.
To access the statistics, see Accessing the statistics.
Use an OS-level tool to identify micro-bursting
An EBS volume can experience micro-bursting even when VolumeIdleTime is low. For volumes with low VolumeIdleTime, use operating system (OS) tools with a granular sample collection to identify whether the workload experiences micro-bursting.
Linux
To report I/O statistics for all your mounted volumes with 1-second granularity, run the iostat command:
iostat -xdmzt 1
The iostat tool is part of the sysstat package. If you can't find the iostat command, then run the following command to install sysstat on Amazon Linux Amazon Machine Images (AMIs):
sudo yum install sysstat -y
For more information, see iostat on the Linux man website.
To determine whether you reached the throughput quota, review the rMBps and wMBps in the output. If rMBps + wMBps is greater than the maximum throughput for the volume, then the volume experiences micro-bursting.
To determine whether you reached the IOPS quota, review the rps and wps in the output. If rps + wps is greater than the volume's maximum IOPS, then the volume experiences micro-bursting.
Windows
Run the perfmon command in Windows Performance Monitor.
Change your volume size or type to accommodate your applications and prevent micro-bursting
Micro-bursting might cause I/O throttling or I/O latency in your application. To prevent this issue, modify the volume to a type and size that accommodates your required IOPS and throughput, even at micro-bursting levels.
There's a maximum IOPS and throughput that the instance can push to all attached EBS volumes. For more information, see Amazon EBS-optimized instance types.
It's a best practice to benchmark your volumes against your workload to determine the volume types that can safely accommodate your workload in a test environment.