How can I identify if my Amazon EBS volume is micro-bursting and then prevent this from happening?

Lesedauer: 5 Minute
0

I have an Amazon Elastic Block Store (Amazon EBS) volume that isn't breaching its throughput or IOPS limit in Amazon CloudWatch. But the volume appears throttled and is experiencing high latency and queue length.

Short description

CloudWatch monitors the IOPS (op/s) and throughput (byte/s) for all Amazon EBS volume types by collecting samples every one minute.

Micro-bursting occurs when an EBS volume bursts high IOPS or throughput for significantly shorter periods than the collection period. Because the volume bursts high IOPS or throughput for a shorter time than the collection period, CloudWatch doesn't reflect the bursting.

Example: An IO1 volume (one-minute collection period) with 950 provisioned IOPS has an application that pushes 1,000 IOPS for five seconds. Amazon EBS throttles the application when it reaches the volume's IOPS limit. At this point, the volume can't handle the workload, causing increased queue length and higher latency.

CloudWatch doesn't show that the volume breached the IOPS limit because the collection period is 60 seconds. 1,000 IOPS occurred for only 5 seconds. For the remaining 55 seconds of the one-minute collection period, the volume remains idle. This means that the number of VolumeReadOps+VolumeWriteOps over the whole minute is 5000 operations (1000*5 seconds). This equates to an average of 83.33 IOPS over one minute (5000/60 seconds). This average usually isn't a concern.

In this case, the VolumeIdleTime at the same sample time is 55 seconds because the volume is idle for the remainder of the collection period. This means that the 5,000 operations (VolumeReadOps+VolumeWriteOps) at that sample time occurs over only five seconds. If you divide 5,000 by 5 to calculate the average IOPS, then you get 1,000 IOPS. 1,000 IOPS is the volume limit.

To determine if micro-bursting is occurring on your volume, do the following:

  1. Use CloudWatch metrics to identify possible micro-bursting.
  2. Use CloudWatch to get the micro-bursting event.
  3. Confirm micro-bursting using an OS-level tool.
  4. Prevent micro-bursting by changing your volume size or type to accommodate your applications.

Resolution

Use CloudWatch metrics to identify possible micro-bursting

1.    Check the VolumeIdleTime metric. This metric indicates the total number of seconds in a specified period of time when no read or write operations are submitted. If VolumeIdleTime is high, then the volume remained idle for most of the collection period. Sufficiently high IOPS or throughput at the same sample time indicates that micro-bursting potentially occurred.

With the VolumeIdleTime metric for throughput there are VolumeReadBytes and VolumeWriteBytes metrics.

2.    Use the following formula to calculate the average throughput that's reached when the volume is active:

Actual Average Throughput in Bytes/s = (Sum(VolumeReadBytes) + Sum(VolumeWriteBytes) ) / (Period - Sum(VolumeIdleTime) ).

With the VolumeIdleTime metric for IOPS there are VolumeReadOps and VolumeWriteOps metrics.

3.    Use the following formula to calculate the average IOPS that's reached when the volume is active:

Actual average IOPS in Ops/s = (Sum(VolumeReadOps) + Sum(VolumeWriteOps) ) / ( Period - Sum(VolumeIdleTime) )

Use CloudWatch to get the micro-bursting event

  1. Open the CloudWatch console.
  2. Choose All Metrics.
  3. Use the volume ID to search for the volume that's affected.
  4. To view throughput metrics, choose Browse, and then add VolumeReadBytes, VolumeWriteBytes, and VolumeIdleTime.
  5. Choose Graphed metrics.
  6. For Statistics, choose Sum, and for Period, choose 1 minute.
  7. For Add Math, choose Start with empty expression.
  8. In the Details of Expression, enter the graph IDs for the Actual Average Throughput in Bytes/s formula. For example, (m1+m2)/(60-m3).

If the formula calculates a value that's greater than the maximum throughput for the volume, then micro-bursting occurred. To check the IOPS metrics, follow the preceding steps, and add VolumeReadOps, VolumeWriteOps, and VolumeIdleTime for step 4.

Confirm micro-bursting using an OS-level tool

The preceding formulas don't always identify micro-bursting in real time. This is because the volume might be micro-bursting even if the VolumeIdleTime is low.

Example: Your volume spikes to a level that breaches the volume's limits. The volume then reduces to a very low level of activity without being completely idle for the remainder of the collection period. The VolumeIdleTime metric doesn't reflect the low activity, even though micro-bursting occurred.

To confirm micro-bursting, use an OS-level tool that has a finer granularity than CloudWatch.

Linux

Use the iostat command. For more information, see iostat(1) on the Linux man page.

1.    To report I/O statistics for all your mounted volumes with one-second granularity, run the following command:

iostat -xdmzt 1

Note: The iostat tool is part of the sysstat package. If you can't find the iostat command, then run the following command to install sysstat on Amazon Linux AMIs:

$ sudo yum install sysstat -y

2.    To determine whether you're reaching the throughput limit, review the rMB/s and wMB/s in the output. If rMB/s + wMB/s is greater than the volume's maximum throughput, then micro-bursting is occurring.

To determine whether you're reaching the IOPS limit, review the r/s and w/s in the output. If r/s + w/s is greater than the volume's maximum IOPS, then micro-bursting is occurring.

Windows

Run the perfmon command in Windows Performance Monitor. For more information see, Determine your IOPS and throughput requirements.

Prevent micro-bursting by changing your volume size or type to accommodate your applications

Change the volume to a type and size that accommodates your required IOPS and throughput. For more information on volume types and their respective IOPS and throughput limits, see Amazon EBS volume types. There are limits on the IOPS/throughput the instance can push to all attached EBS volumes.

It's a best practice to benchmark your volumes against your workload to verify which volume types can safely accommodate your workload in a test environment. For more information, see Benchmark EBS volumes.


AWS OFFICIAL
AWS OFFICIALAktualisiert vor 4 Monaten