Skip to content

How do I troubleshoot Amazon EBS performance issues with my Amazon EC2 instance?

9 minute read
0

I want to troubleshoot Amazon Elastic Block Store (Amazon EBS) issues that occur when my Amazon Elastic Compute Cloud (Amazon EC2) instance reaches Amazon EBS IOPS or throughput thresholds.

Resolution

Review metrics for volume-level or instance level-throttling

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

Complete the following steps:

  1. Open the Amazon CloudWatch console.
  2. In the navigation pane, under Metrics, choose All metrics.
  3. In the search box, enter your EC2 instance ID.
  4. Choose EC2, and then choose Per-Instance Metrics.
  5. Select metrics for either your instance or volume.

Review volume-level metrics

Note: VolumeIOPSExceededCheck and VolumeThroughputExceededCheck metrics are available for only Nitro-based instances and all volume types except magnetic (standard).

Select the following metrics for your volume:

  • For VolumeIOPSExceededCheck, a value of 1 shows that your application tried to use IOPS beyond the volume's provisioned IOPS threshold.
  • For VolumeThroughputExceededCheck, a value of 1 shows that your application tried to use throughput beyond the volume's provisioned throughput threshold.
  • For VolumeAvgIOPS, compare the value to your volume's provisioned IOPS.
  • For VolumeAvgThroughput, compare the value to your volume's provisioned throughput threshold.

To use the AWS CLI to check VolumeIOPSExceededCheck, run the following get-metric-statistics AWS CLI command:

aws cloudwatch get-metric-statistics \
    --namespace AWS/EBS \
    --metric-name VolumeIOPSExceededCheck \
    --dimensions Name=VolumeId,Value=VOLUME-ID \
    --start-time START-TIME \
    --end-time END-TIME \
    --period 300 \
    --statistics Maximum \
    --region REGION

Note: Replace VOLUME-ID with your Amazon EBS volume ID, START-TIME and END-TIME with your time range in ISO 8601 format, and REGION with your AWS Region.

If your volume exceeds its IOPS or throughput thresholds, then take one of the following actions:

Review instance-level metrics

If volume-level metrics show no throttling but performance issues continue, then check instance-level metrics.

Note: InstanceEBSIOPSExceededCheck and InstanceEBSThroughputExceededCheck metrics are available for only Nitro-based instances except bare metal instances.

Each instance type has aggregate IOPS and throughput Amazon EBS thresholds across all attached volumes. When the aggregate I/O exceeds the instance's thresholds, the instance throttles the I/O.

Select the following metrics for your instance:

  • For EBSIOBalance% and EBSByteBalance%, a value below 100% shows that the instance is operating above its baseline and using burst credits. A value of 0% shows that the instance exhausted all burst credits and now operates at throttled baseline performance.
  • For InstanceEBSIOPSExceededCheck, a value of 1 shows that the aggregate I/O from all attached volumes exceeded the instance's IOPS thresholds.
  • For InstanceEBSThroughputExceededCheck, a value of 1 shows that aggregate throughput exceeded the instance's throughput thresholds.

To use the AWS CLI to check InstanceEBSIOPSExceededCheck, run the following get-metric-statistics command:

aws cloudwatch get-metric-statistics \
    --namespace AWS/EC2 \
    --metric-name InstanceEBSIOPSExceededCheck \
    --dimensions Name=InstanceId,Value=INSTANCE-ID \
    --start-time START-TIME \
    --end-time END-TIME \
    --period 300 \
    --statistics Maximum \
    --region REGION

Note: Replace INSTANCE-ID with your EC2 instance ID, START-TIME and END-TIME with your time range in ISO 8601 format, and REGION with your AWS Region.

To calculate aggregate usage across all attached volumes, use CloudWatch metric math expressions.

To calculate aggregate IOPS, use the following metric math expression:

(EBSReadOps + EBSWriteOps) / PERIOD(EBSReadOps)

To calculate aggregate throughput in bytes per second (Bps), use the following metric math expression:

(EBSReadBytes + EBSWriteBytes) / PERIOD(EBSReadBytes)

If your instance exceeds its Amazon EBS IOPS or throughput thresholds, then take one of the following actions:

  • Choose an instance type that supports higher IOPS and throughput.
  • Spread your workload across multiple instances to reduce the I/O for each instance.
  • On supported instance types, including M8g, C8g, R8i, and similar families, configure the instance bandwidth weighting to ebs-1 to increase Amazon EBS bandwidth by 25%.
  • Reduce the provisioned IOPS or throughput on volumes that don't require peak performance to stay within the instance threshold.

Check whether your volume is micro-bursting

Check volume-level metrics

CloudWatch metrics report averages in 1-minute intervals. If your workload uses high IOPS or throughput micro-bursts within 1 minute, then VolumeAvgIOPS and VolumeAvgThroughput might appear normal.

To detect micro-bursts, check VolumeIOPSExceededCheck and VolumeThroughputExceededCheck. A value of 1 shows that the volume exceeded its provisioned performance threshold within the minute, even if the average metrics appear within range.

Use detailed performance statistics

Use Amazon EBS detailed performance statistics. The statistics show you the number of microseconds that your workload tries to use higher IOPS or throughput than the volume's provisioned performance threshold.

Amazon EBS keeps statistics only for the duration of the volume's attachment to the instance. To determine what's causing the micro-bursts, Amazon EBS must capture the data before you use the statistics.

Use an OS-level tool

An Amazon EBS volume can experience micro-bursts even when VolumeIdleTime is low. For volumes with low VolumeIdleTime, use operating system (OS) tools with a granular sample collection to identify whether the workload is micro-bursting.

For Linux, run the following command:

iostat -xdmzt 1

To determine whether you reached the throughput thresholds, review rMBps and wMBps in the output. If rMBps + wMBps is greater than the maximum throughput for the volume, then the volume is micro-bursting.

To determine whether you reached the IOPS threshold, review rps and wps in the output. If rps + wps is greater than the volume's maximum IOPS, then the volume is micro-bursting.

For Windows, run the perfmon command in Windows Performance Monitor. For more information, see Monitoring Amazon DCV performance and statistics.

For more information, see How do I identify whether my Amazon EBS volume experiences micro-bursting and make sure that it doesn't affect performance?

Check BurstBalance for burstable volumes

For gp2 volumes, Throughput Optimized HDD st1 volumes, and Cold HDD sc1 volumes, monitor the BurstBalance metric in CloudWatch. If the value is close to 0%, then the volume exhausted its burst credits and operates at baseline performance.

To resolve burst credit depletion, take one of the following actions:

  • For gp2 volumes, increase the volume size to raise the baseline to three IOPS per GiB, or migrate to a gp3 volume for consistent performance.
  • For st1 and sc1 volumes, increase the volume size or migrate to a volume type that better matches your workload.

Check the read and write I/O sizes

Amazon EBS counts each I/O operation up to 256 KiB as one IOPS for SSD volumes and 1,024 KiB for HDD volumes. Larger operations use multiple IOPS.

To check your average read I/O size in kibibytes (KiB), use the following CloudWatch metric math expression:

Sum(VolumeReadBytes) / Sum(VolumeReadOps) / 1024

To check your average write I/O size in KiB, use the following CloudWatch metric math expression:

Sum(VolumeWriteBytes) / Sum(VolumeWriteOps) / 1024

For st1 and sc1 volumes, if your average I/O size is below 64 KiB, then increase the size of I/O operations to improve performance. For more information, see Amazon EBS I/O characteristics and monitoring.

Maintain an appropriate queue length

To check whether your application generates sufficient I/O, monitor VolumeQueueLength. For improved performance, maintain an average queue length of one per 1,000 provisioned IOPS. A consistently low queue length shows that your application doesn't generate enough I/O to use the provisioned performance.

Initialize volumes that you created from snapshots

New volumes that you create from snapshots experience higher latency on first access to each block.

To avoid higher latency, take one of the following actions:

Configure CloudWatch alarms

To detect performance issues before they affect your applications, create CloudWatch alarms for the following metrics:

  • For VolumeAvgReadLatency and VolumeAvgWriteLatency, set a threshold to alert you when I/O latency exceeds an acceptable value for your workload.
  • For VolumeAvgIOPS and VolumeAvgThroughput, set a threshold to alert you when the volume approaches its provisioned performance.
  • For VolumeStalledIOCheck, set the statistic to Maximum with a threshold of >= 1, a period of 60 seconds, and 10 out of 10 datapoints. The statistic alerts you when an Amazon EBS volume can't complete I/O operations.
  • For VolumeIOPSExceededCheck and VolumeThroughputExceededCheck, set a threshold to alert you when a volume exceeds its provisioned IOPS or throughput.
  • For EBSIOBalance% and EBSByteBalance%, set the statistic to Minimum with a threshold based on your workload's susceptibility to throttling. The statistic alerts you when burstable instances use burst credits.
  • For InstanceEBSIOPSExceededCheck and InstanceEBSThroughputExceededCheck, set a threshold to alert you when an instance exceeds its maximum Amazon EBS IOPS or throughput threshold.

Automatically calculate performance

To automatically calculate aggregate Amazon EBS performance, use the AWSSupport-CalculateEBSPerformanceMetrics AWS Systems Manager runbook. The runbook generates a CloudWatch dashboard for all volumes that you attached to an instance.

Related information

Amazon CloudWatch metrics for Amazon EBS

Amazon EBS-optimized instance types

Amazon EBS introduces additional performance monitoring metrics for Amazon EBS volumes

New Amazon CloudWatch metrics to monitor EC2 instances exceeding I/O performance

AWS OFFICIALUpdated a month ago