Skip to content

Why Your Fresh EBS Volume Shows Data (But Your Snapshot Doesn't)

7 minute read
Content level: Expert
0

A deep dive into EBS block tracking, snapshots, and the curious case of phantom data

Have you ever attached a brand-new EBS volume to your EC2 instance, run a quick dd command to peek at its contents, and been surprised to see what looks like actual data? You're not alone. This scenario puzzles many engineers, especially when they discover that creating a snapshot of that same "data-filled" volume results in a snapshot containing exactly zero blocks.

Let me walk you through what’s actually happening, why it matters, and what it teaches us about how EBS works under the hood.

The Mystery: Data That Isn’t Really There

Picture this scenario. You’ve just launched an EC2 instance and attached a fresh 100 GB EBS volume like below:

EBS-1

Before formatting or mounting it, curiosity gets the better of you, and you decide to take a look at what’s on the raw device:

dd if=/dev/nvme1n1 bs=1M count=1 skip=1000 | hexdump -C | head -20

EBS-2

To your surprise, the output shows what appears to be data:

00000000  df 78 36 77 67 11 ae 9c  71 8b 43 9c e5 e5 34 b7  |.x6wg...q.C...4.|
00000010  0a c7 39 f8 36 d5 13 f1  7a 10 99 7e d1 c4 b9 4b  |..9.6...z..~...K|
00000020  b1 65 1c c9 ae 38 8f 63  cd 6b 9b d7 a2 27 06 3a  |.e...8.c.k...'.:|
00000030  2a b5 e6 55 5e d9 4f 36  b3 47 52 71 20 85 fb b7  |*..U^.O6.GRq ...|
...

That’s clearly not zeros.

> So where did this data come from? Is this leftover from a previous customer? Should you be concerned?

Now here’s where it gets interesting. You create a snapshot of this volume and use the EBS APIs to examine its contents:

aws ebs list-snapshot-blocks --snapshot-id <SNAPSHOT-ID>

EBS-3

EBS-5

The snapshot contains zero blocks and the Full snapshot size is 0 B .

> If the volume has data, why doesn’t the snapshot capture it?

Understanding What You’re Actually Seeing

The “data” you observed with dd isn't real data at all. It's an artifact of how modern storage systems work, and there are two primary sources depending on your configuration.

If your EBS volume is encrypted — which is increasingly common as many AWS accounts now enable encryption by default — what you’re seeing is cryptographic noise.

When you read a block that has never been written to, the encryption layer still needs to return something. It can’t simply return zeros because that would leak information about which blocks contain actual data. Instead, it returns what amounts to encrypted nothingness, which looks like random bytes to anyone examining the raw device.

By returning encryption artifacts for all reads — whether the block was written or not — EBS ensures that you cannot distinguish between “encrypted data” and “encrypted nothing.” This is a deliberate security design, not a bug or artifact.

EBS-6

On Nitro-based instances, the NVMe controller itself may return non-zero patterns for blocks that have never been written. The NVMe specification allows controllers flexibility in what they return for deallocated or never-written blocks. AWS’s implementation may return pseudo-random patterns rather than zeros.

→ Here’s the critical point that addresses any security concerns: this is absolutely not data from a previous customer.

AWS provides complete isolation between customers at both the physical and logical levels. Storage blocks are cryptographically erased before reuse, and each volume has unique encryption keys. Even in the hypothetical scenario where raw storage blocks were somehow shared, decryption would be impossible without the original keys.

The Key Insight: EBS Tracks Writes, Not Reads

This brings us to the fundamental concept that explains the snapshot behavior: EBS maintains a Change Block Tracking system that records which blocks have been explicitly written to. Think of it like a guest book at a hotel. The hotel knows which rooms have had guests check in (writes), but it doesn’t track every time someone walks past a room and glances at the door (reads). When you read from an unwritten block, the storage system returns something — encryption artifacts, controller patterns, or zeros — but this read operation doesn’t register as “this block contains data.

When you create a snapshot, EBS consults its write tracking metadata and asks a simple question: “which blocks have been written since this volume was created?” For a fresh volume with no writes, the answer is none, so the snapshot contains zero blocks.

You can verify this behavior with a simple experiment. First, write some actual data to your volume:

dd if=/dev/urandom \
of=/dev/nvme1n1 \
bs=1M \
count=10 \
oflag=direct

ebs 7

Now create a new snapshot and check its blocks:

aws ec2 create-snapshot \
--volume-id <VOLUME-ID> \
--description "After writing 10MB"
aws ebs list-snapshot-blocks \
--snapshot-id <snap-newone> | jq '.Blocks | length'

ebs 8

(Blocksize = 524288 bytes = 512 KB)

You’ll see approximately 20 blocks (from 0 to 19) in the snapshot.

> Why 20?

Because EBS snapshots use 512 KB blocks, and 10 MB divided by 512 KB equals roughly 20 blocks. The write tracking system recorded those writes, and the snapshot captured them.

EBS-9

| Note that the API doesn’t directly give you the total number of blocks. The API returns the first 10,000 blocks in the snapshot. If the snapshot has more than 10,000 blocks, then the output includes a NextToken.

To have the number of blocks per output, use:

aws ebs list-snapshot-blocks \
--snapshot-id <snap-newone> | jq '.Blocks | length'

ebs 10

Here the snapshot size is 10 MiB

ebs 11

A Word About Block Sizes

Speaking of block sizes, it’s worth understanding that “block size” means different things at different layers of the storage stack, and these sizes don’t necessarily match.

The Storage Stack

ebs 12

Checking Block Sizes at Each Layer

— Filesystem Block Size:

At the filesystem level, when you format a volume with XFS or ext4, you’re typically working with 4 KB blocks. You can verify that by checking the output of the mkfs command during the format of the disk or by running xfs_info on the XFS filesystem and looking for the bsize value, which will usually show 4096.

xfs_info /dev/nvme0n1p1 

ebs 13

— OS Logical/Physical Block Size:

  • Logical block size (what the OS sees)
cat /sys/block/nvme1n1/queue/logical_block_size

ebs 14

  • Physical block size (what the device reports)

ebs 15

— NVMe driver layer:

cat /sys/block/nvme1n1/queue/optimal_io_size

ebs 16

Cost Implications: Pay for What You Write

Understanding write tracking has direct implications for your AWS bill. EBS snapshot pricing is based on the actual data stored, not the volume size. If you have a 1 TB volume but have only written 50 GB of data to it, your snapshot stores and bills for approximately 50 GB.

This becomes even more efficient with incremental snapshots. After your initial snapshot captures all written blocks, subsequent snapshots only capture blocks that have changed. If you modify 2 GB of data between snapshots, your second snapshot adds only about 2 GB to your storage costs, not another 50 GB.

For the example we explored earlier with 20 blocks of 512 KB each, the total data is just 10 MB. At standard snapshot pricing, that’s a fraction of a cent per month — dramatically less than you might expect from a 100 GB volume.