EBS Volume performance (What am i doing wrong?)

0

Hello, i have ec2 instances (r6.xlarge) with extra ebs volume (gp3 mounted to /opt). Every few months i need to upgrade the application on the instances. that upgrade process also requires the data (living on /opt) to be upgraded, which is a time consuming process. So i do the following: I upgrade the app and data on one ec2 instance. Then i take a snapshot of that volume to provide the "upgraded data" to the other instances so they just have to have the correct app version. This works so far that the other instances do not have to "upgrade the data" but just validate it, which is a much quicker process, but still requires quite alot of IO Performance.

I am trying to optimize the validation phase by using the fast snapshot restore option. I Also increase the IOPs of the volume to 10000 and the througput to 500. But when the new instances start, with the newly created volume, and needs to go through that validation phase, they barely seem to reach that volume performance. I am looking at atop and see the following:

DSK |       nvme1n1 |  busy    100% |  read    3612 |  write      6 |  discrd     0  | KiB/r    126  | KiB/w      9  | MBr/s   44.6  | MBw/s    0.0  | avio 2.76 ms  |

So it seems to mee that the volume hardly gets above 3500 IOps and about 50MB of througput. So my question is, what am i doing wrong? Am i looking at the wrong metrics? Did i forget something in setting up the volume to reach the 10000 IOps? I understood that volumes with 400GB+ can adjust their IOps/Throuput, no?

asked 3 months ago281 views
5 Answers
2

Hi,

You may want to follow this KC article to check if all what it suggests is set up properly in your use case: https://repost.aws/knowledge-center/optimize-ebs-provisioned-iops

Best,

Didier

profile pictureAWS
EXPERT
answered 3 months ago
profile picture
EXPERT
reviewed 3 months ago
EXPERT
reviewed 3 months ago
1

Hey,

So it seems to me that the volume hardly gets above 3500 IOps and about 50MB of througput.

Yes, as per iotop output. But your volume is busy doing I/O (100% utilization) throughout that interval.

So my question is, what am i doing wrong? Am i looking at the wrong metrics? Did i forget something in setting up the volume to reach the 10000 IOps?

During the validation phase, run "sudo iostat -xdmzt 1 300" and share the results here. In addition to that, keep an eye either on Cloudwatch metrcis or on the metrics displayed in the "monitoring" tab of that volume in AWS console for "latency" and "queue length". Monitoring tab

Also I would like to know what all happens during the validation phase.

I understood that volumes with 400GB+ can adjust their IOps/Throuput, no?

I believe you are using GP3 volume and it won't adjust the IOPs/Throughput. Burstable volumes like GP2 can burst (IOPs) for a certain period of time.

When you enable FSR for a snapshot taken from the upgraded volume, make sure that FSR metric "FastSnapshotRestoreCreditsBalance" for the snapshot should atleast have 1 credit before restoring (creating a volume) in the other instance.

Below information is just a note on EBS performance limits:

When it comes to EBS volume performance limits, you have to consider the limits for your EC2 instance type as well. Each EC2 instance type has certain limits when it comes to the traffic it can handle to the EBS volumes. For Example:- Ref EBS optimized instances r6i.xlarge limits

Instance type r6i.xlarge has got baseline and burstable bandwidth limit for EBS traffic. But at the max it can handle only 1250 MB/s and 40K IOPs for 30 minutes, every 24 hours. This is the aggregate bandwidth for all EBS volumes attached to that instance. Once the burst credits are over, the aggregate bandwidth and IOPs drops to 156.25 MB/s and 6K IOPs. So, if you configure an EBS volume with 10K IOPs and 500 MB/s throughput and attach it to this instance, volume performance will be met during the bursting period of r6i.xlarge instance. But once r6i.xlarge has lost all the burst credits, it will allow only 156.25 MB/s and 6K IOPs to the EBS volume.

Iostat will be able to capture all these throttling. Whenever there is a throttling, you will see the queue length getting increased and elevated latency in EBS Cloudwatch metrics.

AWS
answered 2 months ago
0

Thank you for the link, will have a look. However i am not using provisioned iops volumes but general purpose ones. The reason is that for the most part i dont need that much. Its just the validation phase after the upgrade where i would need alot more then usual. Once validated i would be very fine with the defaults of a gp3 volume 3000/125 ...

answered 3 months ago
0

Other then "use provisioned iops" ... shouldn't gp3 be able to do more then 3000/125? If so, how?

answered 3 months ago
0

hii

Check these steps to resolve issue:

Monitor Instance Performance

  • Use tools like top, iostat, and vmstat to assess CPU, memory, and disk I/O utilization.
  • Identify potential bottlenecks and optimize accordingly (e.g., increase instance type, adjust application settings)..

Verify EBS

  • Check the volume's IOPS and throughput settings in the AWS Management Console.
  • Ensure the volume is in the available state.
  • Consider using EBS volume metrics to monitor actual performance.
  1. Data Compression: Compressing data on the EBS volume can reduce storage space and potentially improve I/O performance.
  2. File System Optimization: Ensure the file system used on the volume is configured optimally for performance (e.g., appropriate block size, mount options).
  3. Application-Level Optimization: Consider application-specific optimizations to reduce I/O requirements.

By systematically addressing these areas, you should be able to identify and resolve the performance bottlenecks affecting your validation process.

profile picture
EXPERT
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions