Measuring egress traffic from EC2 instances to Elastic Block Storage volumes

0

Hi, everyone!

I am looking to conduct some measurements on AWS. Specifically, I wish to measure the egress network traffic from AWS EC2 instances to Elastic Block Storage volumes. To achieve this, I have used network monitoring tools like tcpdump and bmon, but they showed no traces of network traffic from my VMs to the attached EBS volumes. Additionally, I tried to use perf (perf record -e probe:inet_sendmsg -aR sleep 60) to trace inet_sendmsg calls, but that didn't yield anything as well.

This is very surprising to me -- I was expecting to see increased network utilization since my understanding says that EBS volumes would be attached to the EC2 instances as network block devices (https://en.wikipedia.org/wiki/Network_block_device).

I was wondering if the kind members of the community could demystify some of this for me. I would be very grateful for your input on this. 😄

asked 8 months ago302 views
1 Answer
0

An EBS volume is not a network-attached device, and doesn't use any network devices or network drivers for data transfer.

EBS is a block device, much like a hard disk is in the "real world".

Are you perhaps thinking of EFS, which reads & writes data to & from the EC2 over the network?

profile picture
EXPERT
Steve_M
answered 8 months ago
profile pictureAWS
EXPERT
reviewed 8 months ago
  • Hey, Steve! Thanks a lot for your response.

    If I understand correctly, EBS volumes aren't physical storage devices (HDDs/SSDs) attached to EC2 instances, instead, they're disaggregated, right? From https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes.html, I see the following:

    1. "When you create an EBS volume, it is automatically replicated within its Availability Zone to prevent data loss due to failure of any single hardware component."
    2. "An EBS volume is off-instance storage that can persist independently from the life of an instance. "

    If the volumes are indeed disaggregated (off-instance storage), and data is replicated to the volume, shouldn't there be some egress network traffic from the EC2 instance to EBS? Please let me know if I'm missing something here. :D

  • I understand correctly, EBS volumes aren't physical storage devices (HDDs/SSDs)

    Ultimately that's exactly what they are. If you were able to trace it back and unwrap all the layers of abstraction between the EBS volume and the AWS data centre, you would find that every block of diskspace that you provision in EBS can be traced back to a location on a disk drive within a storage array in a data centre.

    Obviously if you allocate yourself a 100GB volume, there isn't a 100GB hard drive in the data centre that now has your name on it. There will be thousands of disks in multiple disk arrays, petabytes and petabytes worth, all joined together in storage pools using RAID and de-dupe and other smart things.

    When you allocate yourself 100GB EBS volume, the array will carve off 100GB from the pool and it will be presented to you in the form of an EBS volume for you to allocate to the EC2 instance. It's not a physical storage device that you can feel and touch like the SSD inside your laptop, but then again the EC2 isn't something you can feel and touch either. It's all virtualised and abstracted, but ultimately it works the same - instead of a physical server with a physical disk controller on a physical motherboard attached to a physical SSD, it's all virtualised.

    A laptop or server doesn't use the network to read and write data from its directly-attached disks, and neither does EC2 use the network to read & write EBS.

    ( continued in next comment)

    1. "When you create an EBS volume, it is automatically replicated within its Availability Zone to prevent data loss due to failure of any single hardware component."

    This is a simplified way of saying that the storage array on which your data ultimately lives is doing hardware RAID behind the scenes. When I said above every block of diskspace that you provision on AWS can be traced back to a location on a disk drive within a storage array it's actually going to exist in two or three locations, to guard against hardware failure (disk go bad, so do cables, so do power supplies, etc.).

    It doesn't mean that your EC2 is having to do the job of replicating the data to somewhere else on the network.

    1. "An EBS volume is off-instance storage that can persist independently from the life of an instance. "

    This is to distinguish EBS from instance store volumes which are block devices that can only be attached to the specified EC2, and only exist for the life of the instance. With EBS you can attach the volume to an instance, write some data to it, then detach the volume and terminate the instance.

    Later on you can provision a new EC2, attach the EBS volume, and the data written by the old EC2 will be available for the new EC2 to consume.

  • Thanks a lot for the elaborate and AWESOME answer, Steve! This makes things clearer to me.

    One more question here -- how does the block device driver on the VM communicate with the storage array? What layers of abstraction does it go through?

  • I'm not sure of the specifics of how AWS does it, but it will be along these lines:

    Start with a whole disk array stuffed full of a petabyte's worth of physical disks. There will be an attached computer whose job is to manage that disk array, let's call it the controller.

    Logically the controller will organise this as a lot of smaller volumes where each volume is striped across a lot of physical disks (for performance) and let's say 20% of the total is used for parity (so data isn't lost in the event of a disk failure), so out of 1PB we have 800TB useable. Let's say that the controller presents this to the underlying physical server - on which a hypervisor is running - as 100 x 8TB volumes.

    Neither the server nor hypervisor knows that they're seeing logical volumes that are abstracted away from them by a storage controller. They can't tell the difference between that and a load of "real" 8TB disks from (say) Seagate, and they don't need to know the difference.

    On the hypervisor you provision a VM (and that's all an EC2 instance is, a VM running on AWS's hypervisor) and you want to give it 100GB disk. The hypervisor allocates 100GB from one of these 8TB volumes to your VM, or it might do 50GB from each of two volumes, or 40+40+20 from three volumes - it doesn't matter, the hypervisor will abstract this from the VM, all the VM sees is a single 100GB block device appear at the other end of its (virtualised) disk adapter interface.

    [ continued in next comment ]

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions