How do I troubleshoot Amazon EBS performance issues with my Amazon EC2 instance?
I want to troubleshoot Amazon Elastic Block Store (Amazon EBS) issues that occur when my Amazon Elastic Compute Cloud (Amazon EC2) instance reaches Amazon EBS IOPS or throughput thresholds.
Resolution
Review metrics for volume-level or instance level-throttling
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Complete the following steps:
- Open the Amazon CloudWatch console.
- In the navigation pane, under Metrics, choose All metrics.
- In the search box, enter your EC2 instance ID.
- Choose EC2, and then choose Per-Instance Metrics.
- Select metrics for either your instance or volume.
Review volume-level metrics
Note: VolumeIOPSExceededCheck and VolumeThroughputExceededCheck metrics are available for only Nitro-based instances and all volume types except magnetic (standard).
Select the following metrics for your volume:
- For VolumeIOPSExceededCheck, a value of 1 shows that your application tried to use IOPS beyond the volume's provisioned IOPS threshold.
- For VolumeThroughputExceededCheck, a value of 1 shows that your application tried to use throughput beyond the volume's provisioned throughput threshold.
- For VolumeAvgIOPS, compare the value to your volume's provisioned IOPS.
- For VolumeAvgThroughput, compare the value to your volume's provisioned throughput threshold.
To use the AWS CLI to check VolumeIOPSExceededCheck, run the following get-metric-statistics AWS CLI command:
aws cloudwatch get-metric-statistics \ --namespace AWS/EBS \ --metric-name VolumeIOPSExceededCheck \ --dimensions Name=VolumeId,Value=VOLUME-ID \ --start-time START-TIME \ --end-time END-TIME \ --period 300 \ --statistics Maximum \ --region REGION
Note: Replace VOLUME-ID with your Amazon EBS volume ID, START-TIME and END-TIME with your time range in ISO 8601 format, and REGION with your AWS Region.
If your volume exceeds its IOPS or throughput thresholds, then take one of the following actions:
- For Provisioned IOPS SSD io1 volumes, io2 Block Express volumes, or General Purpose SSD gp3 volumes, modify the volume to increase provisioned IOPS. For gp3 volumes, independently increase provisioned throughput.
- For only General Purpose SSD gp2 volumes, increase the volume size to increase baseline IOPS. A gp2 volume provides three IOPS per GiB, up to 16,000 IOPS at 5,334 GiB.
- Switch to a volume type that supports higher performance for your workload.
Review instance-level metrics
If volume-level metrics show no throttling but performance issues continue, then check instance-level metrics.
Note: InstanceEBSIOPSExceededCheck and InstanceEBSThroughputExceededCheck metrics are available for only Nitro-based instances except bare metal instances.
Each instance type has aggregate IOPS and throughput Amazon EBS thresholds across all attached volumes. When the aggregate I/O exceeds the instance's thresholds, the instance throttles the I/O.
Select the following metrics for your instance:
- For EBSIOBalance% and EBSByteBalance%, a value below 100% shows that the instance is operating above its baseline and using burst credits. A value of 0% shows that the instance exhausted all burst credits and now operates at throttled baseline performance.
- For InstanceEBSIOPSExceededCheck, a value of 1 shows that the aggregate I/O from all attached volumes exceeded the instance's IOPS thresholds.
- For InstanceEBSThroughputExceededCheck, a value of 1 shows that aggregate throughput exceeded the instance's throughput thresholds.
To use the AWS CLI to check InstanceEBSIOPSExceededCheck, run the following get-metric-statistics command:
aws cloudwatch get-metric-statistics \ --namespace AWS/EC2 \ --metric-name InstanceEBSIOPSExceededCheck \ --dimensions Name=InstanceId,Value=INSTANCE-ID \ --start-time START-TIME \ --end-time END-TIME \ --period 300 \ --statistics Maximum \ --region REGION
Note: Replace INSTANCE-ID with your EC2 instance ID, START-TIME and END-TIME with your time range in ISO 8601 format, and REGION with your AWS Region.
To calculate aggregate usage across all attached volumes, use CloudWatch metric math expressions.
To calculate aggregate IOPS, use the following metric math expression:
(EBSReadOps + EBSWriteOps) / PERIOD(EBSReadOps)
To calculate aggregate throughput in bytes per second (Bps), use the following metric math expression:
(EBSReadBytes + EBSWriteBytes) / PERIOD(EBSReadBytes)
If your instance exceeds its Amazon EBS IOPS or throughput thresholds, then take one of the following actions:
- Choose an instance type that supports higher IOPS and throughput.
- Spread your workload across multiple instances to reduce the I/O for each instance.
- On supported instance types, including M8g, C8g, R8i, and similar families, configure the instance bandwidth weighting to ebs-1 to increase Amazon EBS bandwidth by 25%.
- Reduce the provisioned IOPS or throughput on volumes that don't require peak performance to stay within the instance threshold.
Check whether your volume is micro-bursting
Check volume-level metrics
CloudWatch metrics report averages in 1-minute intervals. If your workload uses high IOPS or throughput micro-bursts within 1 minute, then VolumeAvgIOPS and VolumeAvgThroughput might appear normal.
To detect micro-bursts, check VolumeIOPSExceededCheck and VolumeThroughputExceededCheck. A value of 1 shows that the volume exceeded its provisioned performance threshold within the minute, even if the average metrics appear within range.
Use detailed performance statistics
Use Amazon EBS detailed performance statistics. The statistics show you the number of microseconds that your workload tries to use higher IOPS or throughput than the volume's provisioned performance threshold.
Amazon EBS keeps statistics only for the duration of the volume's attachment to the instance. To determine what's causing the micro-bursts, Amazon EBS must capture the data before you use the statistics.
Use an OS-level tool
An Amazon EBS volume can experience micro-bursts even when VolumeIdleTime is low. For volumes with low VolumeIdleTime, use operating system (OS) tools with a granular sample collection to identify whether the workload is micro-bursting.
For Linux, run the following command:
iostat -xdmzt 1
To determine whether you reached the throughput thresholds, review rMBps and wMBps in the output. If rMBps + wMBps is greater than the maximum throughput for the volume, then the volume is micro-bursting.
To determine whether you reached the IOPS threshold, review rps and wps in the output. If rps + wps is greater than the volume's maximum IOPS, then the volume is micro-bursting.
For Windows, run the perfmon command in Windows Performance Monitor. For more information, see Monitoring Amazon DCV performance and statistics.
For more information, see How do I identify whether my Amazon EBS volume experiences micro-bursting and make sure that it doesn't affect performance?
Check BurstBalance for burstable volumes
For gp2 volumes, Throughput Optimized HDD st1 volumes, and Cold HDD sc1 volumes, monitor the BurstBalance metric in CloudWatch. If the value is close to 0%, then the volume exhausted its burst credits and operates at baseline performance.
To resolve burst credit depletion, take one of the following actions:
- For gp2 volumes, increase the volume size to raise the baseline to three IOPS per GiB, or migrate to a gp3 volume for consistent performance.
- For st1 and sc1 volumes, increase the volume size or migrate to a volume type that better matches your workload.
Check the read and write I/O sizes
Amazon EBS counts each I/O operation up to 256 KiB as one IOPS for SSD volumes and 1,024 KiB for HDD volumes. Larger operations use multiple IOPS.
To check your average read I/O size in kibibytes (KiB), use the following CloudWatch metric math expression:
Sum(VolumeReadBytes) / Sum(VolumeReadOps) / 1024
To check your average write I/O size in KiB, use the following CloudWatch metric math expression:
Sum(VolumeWriteBytes) / Sum(VolumeWriteOps) / 1024
For st1 and sc1 volumes, if your average I/O size is below 64 KiB, then increase the size of I/O operations to improve performance. For more information, see Amazon EBS I/O characteristics and monitoring.
Maintain an appropriate queue length
To check whether your application generates sufficient I/O, monitor VolumeQueueLength. For improved performance, maintain an average queue length of one per 1,000 provisioned IOPS. A consistently low queue length shows that your application doesn't generate enough I/O to use the provisioned performance.
Initialize volumes that you created from snapshots
New volumes that you create from snapshots experience higher latency on first access to each block.
To avoid higher latency, take one of the following actions:
- Turn on Amazon EBS fast snapshot restore (FSR) for frequently accessed snapshots.
- Specify a Provisioned Rate for Volume Initialization to initialize the volume at a predictable rate. Supported rates range from 100 to 300 MiBps.
- Before you use the volume in production, read all blocks to manually initialize volumes. For example, run the dd or fio command.
Configure CloudWatch alarms
To detect performance issues before they affect your applications, create CloudWatch alarms for the following metrics:
- For VolumeAvgReadLatency and VolumeAvgWriteLatency, set a threshold to alert you when I/O latency exceeds an acceptable value for your workload.
- For VolumeAvgIOPS and VolumeAvgThroughput, set a threshold to alert you when the volume approaches its provisioned performance.
- For VolumeStalledIOCheck, set the statistic to Maximum with a threshold of >= 1, a period of 60 seconds, and 10 out of 10 datapoints. The statistic alerts you when an Amazon EBS volume can't complete I/O operations.
- For VolumeIOPSExceededCheck and VolumeThroughputExceededCheck, set a threshold to alert you when a volume exceeds its provisioned IOPS or throughput.
- For EBSIOBalance% and EBSByteBalance%, set the statistic to Minimum with a threshold based on your workload's susceptibility to throttling. The statistic alerts you when burstable instances use burst credits.
- For InstanceEBSIOPSExceededCheck and InstanceEBSThroughputExceededCheck, set a threshold to alert you when an instance exceeds its maximum Amazon EBS IOPS or throughput threshold.
Automatically calculate performance
To automatically calculate aggregate Amazon EBS performance, use the AWSSupport-CalculateEBSPerformanceMetrics AWS Systems Manager runbook. The runbook generates a CloudWatch dashboard for all volumes that you attached to an instance.
Related information
Amazon CloudWatch metrics for Amazon EBS
Amazon EBS-optimized instance types
Amazon EBS introduces additional performance monitoring metrics for Amazon EBS volumes
New Amazon CloudWatch metrics to monitor EC2 instances exceeding I/O performance
- Topics
- Storage
- Language
- English

Relevant content
- asked 2 years ago
- Accepted Answerasked a year ago
AWS OFFICIALUpdated 2 months ago