Skip to content

How do I copy an EBS Snapshot to another region using coldsnap and the EBS Direct APIs?

15 minute read
Content level: Expert
1

This article provides step-by-step guidance on using coldsnap to copy EBS snapshot data block-by-block across AWS regions using the EBS Direct APIs, and then verify the integrity of the resulting data.

If you are migrating data from the Middle East (UAE) Region (me-central-1), then you might experience increased error rates as we continue making progress with restoration efforts. For additional information about recovery efforts and service updates that impact your AWS accounts, see the AWS Personal Health Dashboard. For assistance with this event, contact AWS Support through the AWS Management Console or the AWS Support Center.

This approach complements the snapshot-based and image-based methods covered in How do I migrate my Compute and Container resources to another region?.

This is part of a series of articles that provide general guidance on migrating resources form one Region to another. This article provides guidance on migrating RDS and Aurora databases to another AWS region using logical dump and restore via an intermediary EC2 instance. This method is useful when native snapshot copy or replication options are not available or practical.

For general guidance and a full list of domain and service-specific migration guides, see How do I migrate my resources to another region?

For other domains, see the following resources:

Overview

The standard approach to cross-region snapshot copies is aws ec2 copy-snapshot. However, there are scenarios where you need lower-level control over the snapshot data when the standard copy mechanism is unavailable.

coldsnap is an open-source command-line tool from AWS Labs that uses the EBS Direct APIs to download and upload EBS snapshot data block by block, without needing to create intermediate EBS volumes or manage volume attachments during the transfer itself.

The EBS Direct APIs provide six actions — three for reading (ListSnapshotBlocks, ListChangedBlocks, GetSnapshotBlock) and three for writing (StartSnapshot, PutSnapshotBlock, CompleteSnapshot).

coldsnap wraps these APIs into simple download and upload commands.

Key considerations

  • EBS Direct APIs cannot be used with archived snapshots or public snapshots
  • For encrypted snapshots, the principal also needs kms:Decrypt on the source KMS key and kms:CreateGrant + kms:GenerateDataKeyWithoutPlaintext on the target KMS key.

Prerequisites

  • An EBS snapshot in the source region (e.g., snap-0abcdef1234567890 in us-east-1).
  • An EC2 Linux instance with the Amazon Linux 2023 AMI in the target region (e.g., eu-west-1) with:
    • Sufficient local disk space to hold the snapshot dump (at least the size of the source volume).
    • IAM role with EBS Direct API permissions in both source and target regions.
    • AWS CLI configured with credentials that have the required permissions, if the IAM role is attached through an instance profile the AWS CLI will gain credentials through the instance profile.

Launch a temporary instance in the target Region

Launch a temporary instance in the target region. EBS and networking performance depend on the instance type.

Select the instance type that better suits your needs. We recommend you select at least a t3.large instance

Ensure the security group allows SSH (TCP 22) for Linux or use SSM Session Manager to connect to the instance.

The instance must be associated with an IAM Role with permissions for ebs:ListSnapshotBlocks, ebs:GetSnapshotBlock (source region) and ebs:StartSnapshot, ebs:PutSnapshotBlock, ebs:CompleteSnapshot (target region).

See Control access to EBS direct APIs using IAM.

AMI_ID=$(aws ssm get-parameters \
  --region eu-west-1 \
  --names /aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-x86_64 \
  --query 'Parameters[0].Value' \
  --output text)

aws ec2 run-instances \
  --region eu-west-1 \
  --image-id "$AMI_ID" \
  --instance-type t3.large \
  --placement AvailabilityZone=eu-west-1a \
  --key-name my-key-pair \
  --security-group-ids sg-0123456789abcdef0 \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=recovery-temp-instance}]'

For Windows volumes you need to also launch windows based recovery instance:

AMI_ID=$(aws ssm get-parameters \
  --region eu-west-1 \
  --names /aws/service/ami-windows-latest/Windows_Server-2022-English-Full-Base \
  --query 'Parameters[0].Value' \
  --output text)

aws ec2 run-instances \
  --region eu-west-1 \
  --image-id "$AMI_ID" \
  --instance-type t3.large \
  --placement AvailabilityZone=eu-west-1a \
  --key-name my-key-pair \
  --security-group-ids sg-0123456789abcdef0 \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=recovery-temp-instance-win}]'

Note the InstanceId from the output. Wait for it to be running:

aws ec2 wait instance-running \
  --region eu-west-1 \
  --instance-ids i-0abc123def456789

Install Cargo

Log on to the recovery instance and install cargo:

# install cargo using the AL2023 package
sudo dnf install cargo -y

Install coldsnap on the EC2 instance in the target region

Install coldsnap from source, important do not install in any other way:

# Install coldsnap
cargo install --git https://github.com/awslabs/coldsnap.git --branch develop

This step might take a long time, depending on the instance type, as it will compile the tool and all its dependencies from source.

Then add the folder with the binary to your PATH

export PATH=$PATH:~/.cargo/bin

Verify the installation:

coldsnap --help

Download the snapshot from the source region

Important! Ensure the instance has enough disk space to download the snapshot. Move to the folder to which you will download the snapshot and use

df -h `pwd`

to verify before starting.

Use coldsnap download to read the snapshot block by block from the source region and write it to a local file.

Specify the source region explicitly using --region before the download command and the --checkpoint option after the download command:

coldsnap \
  --region us-east-1 \
  download
  snap-0abcdef1234567890 \
  snap-0abcdef1234567890.img \
  --checkpoint

This uses the EBS Direct APIs (ListSnapshotBlocks and GetSnapshotBlock) to download each 512 KiB block of the snapshot and write it sequentially to the output file.

During download two files will be created:

  • snap-0abcdef1234567890.img.partial containing the actual snapshot partial data
  • snap-0abcdef1234567890.img.coldsnap-progress containing a checkpoint of successful chunks that coldsnap can use if you restart it after a failure

Upon successful download, a single file snapshot-dump.img will be present.

If the download fails to get some of the chunks of the snapshot, the output will be something like:

Failed to download snapshot: 
Failed to get 3 blocks for snapshot 'snap-0abcdef1234567890': 
blocks [3105, 4712, 58002]

Each block is 512 KiB so you can calculate where the gaps are in the partial file.

If the download fails, the snapshot file also will remain with the .partial extension, and the .coldsnap-progress file will remain.

Rerunning the same coldsnap command will retry to fetch just the missing blocks, which can help in the case of network or other failures

The download time depends on the snapshot size and network throughput and retries needed.

First check

If the snapshot you downloaded is from a Linux volume, you can now associate a loopback device to the downloaded image to mount it as a volume before uploading it to EBS in the target region to make sure the data in the snapshot is usable.

It should work with both successful and partial images:

sudo losetup -f --show snap-0abcdef1234567890.dmp.partial

this will assign the first available loopback device to the image. The output will be the name of the device.

/dev/loop0

You can now follow the steps provided in section Filesystem check, mount, and recovery before uploading the image to create a snapshot in the target region.


Upload the dump as a new snapshot in the target region

Upload the local file as a new EBS snapshot in the target region:

coldsnap \
  --region eu-west-1 \
  upload \
  --wait \
  snap-0abcdef1234567890.img

The --wait flag causes coldsnap to poll until the snapshot reaches the completed state.

This uses the EBS Direct APIs (StartSnapshot, PutSnapshotBlock, CompleteSnapshot) to write each block into the new snapshot.

The command outputs the new snapshot ID, for example:

snap-0fedcba9876543210

Optionally, tag the snapshot for identification:

aws ec2 create-tags \
  --region eu-west-1 \
  --resources snap-0fedcba9876543210 \
  --tags Key=Name,Value="Recovered from us-east-1" Key=Source,Value=snap-0abcdef1234567890

Create an EBS volume from the new snapshot in the same AZ as the recovery instance

Create a volume in a specific Availability Zone within the target region.

The volume and the instance you attach it to must be in the same AZ.

See Create an Amazon EBS volume.

aws ec2 create-volume \
  --region eu-west-1 \
  --availability-zone eu-west-1a \
  --snapshot-id snap-0fedcba9876543210 \
  --volume-type gp3 \
  --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=recovery-volume}]'

Note the VolumeId from the output (e.g., vol-0abc123def456789).

Wait for the volume to become available:

aws ec2 wait volume-available \
  --region eu-west-1 \
  --volume-ids vol-0abc123def456789

Attach the volume to the temporary instance

For Linux you can reuse the Linux instance or launch a new one. For Windows you need to launch a Windows recovery instance as well.

Attach the volume to the instance. See Attach an Amazon EBS volume to an Amazon EC2 instance.

aws ec2 attach-volume \
  --region eu-west-1 \
  --volume-id vol-0abc123def456789 \
  --instance-id i-0abc123def456789 \
  --device /dev/sdf

Filesystem check, mount, and recovery for Linux volumes

Note: filesystem checks ensure the integrity of the filesystem and not necessarily of all data stored. You should use the tooling relevant to your application to verify the consistency of the data stored in the snapshot. For instance, if you are running MySQL on the instance, you would run OPTIMIZE, CHECK and REPAIR on all tables.

Identify the device and run filesystem checks (Linux)

SSH into the temporary instance and identify the device:

# get the device name (nvme1n1 in this example)
lsblk

# verifies the type of the special device file 
sudo file -s /dev/nvme1n1

the output should be something like:

/dev/nvme1n1: DOS/MBR boot sector, extended partition table (last)

then get the partition filesystem information with

sudo lsblk -f

the output will be something like:

NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
nvme0n1
├─nvme0n1p1 xfs / bb1ad377-aefa-4354-a419-b1d6a31d6d2c 79.7G 20% /
├─nvme0n1p127
└─nvme0n1p128 vfat FAT16 3D07-5F4A 8.7M 13% /boot/efi
nvme1n1
├─nvme1n1p1 xfs / e8b842a5-f549-434e-bc03-152dd3e41fc6
└─nvme1n1p128

in this case the filesystem type in partition nvme1n1p1 is xfs.

Before mounting, run a filesystem check on the unmounted device. The tool depends on the filesystem type. See Make an Amazon EBS volume available for use.

For ext4 filesystems:

# Dry-run check first (no modifications)
sudo e2fsck -n /dev/nvme1n1

# If errors are found, run with automatic repair
sudo e2fsck -y /dev/nvme1n1

For XFS filesystems:

# Check the filesystem (XFS checks are read-only by default)
sudo xfs_repair -n /dev/nvme1n1p1

# If errors are found, run repair
sudo xfs_repair /dev/nvme1n1p1

Important: Always run the check on an unmounted filesystem. Running e2fsck or xfs_repair on a mounted filesystem can cause data corruption.

If the filesystem check reports unrecoverable errors, proceed to Step 9 (Linux) for specialised recovery tools before attempting to mount.

Mount the volume and clean up unnecessary data (Linux) - optional

If the filesystem check was successful, mount the volume:

sudo mkdir -p /mnt/recovery
sudo mount /dev/nvme1n1p1 /mnt/recovery

Remove unnecessary data and empty caches to reduce the volume size and prepare it for use:

# Clear package manager caches
sudo rm -rf /mnt/recovery/var/cache/yum/*
sudo rm -rf /mnt/recovery/var/cache/dnf/*
sudo rm -rf /mnt/recovery/var/cache/apt/archives/*.deb 2>/dev/null

# Clear temporary files
sudo rm -rf /mnt/recovery/tmp/*
sudo rm -rf /mnt/recovery/var/tmp/*

# Clear log files (optional — review before deleting)
sudo find /mnt/recovery/var/log -type f -name "*.gz" -delete
sudo find /mnt/recovery/var/log -type f -name "*.old" -delete

# Clear user-level caches
sudo rm -rf /mnt/recovery/home/*/.cache/*

Run specialised file recovery tools (if needed) (Linux)

If the filesystem check reported errors that could not be fully repaired, or if you suspect data loss, use specialised recovery tools.

Install recovery tools on the temporary instance:

wget https://www.cgsecurity.org/testdisk-7.2.linux26-x86_64.tar.bz2
tar xjf testdisk-7.2.linux26-x86_64.tar.bz2
cd testdisk-7.2

TestDisk — recovers lost partitions and repairs partition tables:

sudo ./testdisk_static /dev/nvme1n1

Follow the interactive menu to analyse the disk, search for lost partitions, and write a corrected partition table if needed.

PhotoRec (bundled with TestDisk) — recovers individual files regardless of filesystem state:

sudo ./photorec_static /dev/nvme1n1

Follow the prompts to select the partition, filesystem type, and output directory for recovered files.

For ext4 filesystems with journal issues:

# Replay the journal
sudo e2fsck -y /dev/nvme1n1

# If the journal is corrupt, recreate it (last resort)
sudo tune2fs -O ^has_journal /dev/nvme1n1
sudo e2fsck -y /dev/nvme1n1
sudo tune2fs -j /dev/nvme1n1

Filesystem check, mount, and recovery for Windows volumes

Bring the disk online and run chkdsk (Windows)

Connect to the Windows instance via RDP. By default, Windows keeps newly attached EBS volumes offline. See Make an Amazon EBS volume available for use and Resolve offline EBS volume on EC2 Windows instance.

Open PowerShell as Administrator and identify the disk:

Get-Disk

The attached volume will appear with OperationalStatus: Offline and PartitionStyle: MBR or GPT. Note the disk number (e.g., 1).

Bring the disk online read-only first, so chkdsk can scan without the OS writing to it. The -IsOffline and -IsReadOnly parameters are in different parameter sets and must be set in separate calls. Set read-only first, then bring online:

Set-Disk -Number 1 -IsReadOnly $true
Set-Disk -Number 1 -IsOffline $false

Identify the volume and its drive letter(s) or assign one:

Get-Partition -DiskNumber 1
# If no drive letter is assigned:
Get-Partition -DiskNumber 1 | Set-Partition -NewDriveLetter D

Run chkdsk in read-only scan mode for NTFS volumes

chkdsk D: /scan

The /scan parameter runs an online scan without fixing any errors or taking the disk offline.

If errors are found, take the disk read-write and run repair:

Set-Disk -Number 1 -IsReadOnly $false
chkdsk D: /f /r
  • /f — fixes filesystem errors and requires exclusive access to the volume.
  • /r — locates bad sectors and recovers readable information.

Important: chkdsk /f and /r require exclusive access to the volume. Do not run them on the system (C:) drive of a running instance, only on a secondary attached volume as shown here.

Clean up unnecessary data (Windows)

After chkdsk completes successfully, the volume is mounted at the assigned drive letter (e.g., D:\). Remove unnecessary data and caches:

If not done already remount the disk in read-write mode:

Set-Disk -Number 1 -IsReadOnly $false
# Clear Windows temp files
Remove-Item -Path "D:\Windows\Temp\*" -Recurse -Force -ErrorAction SilentlyContinue
Remove-Item -Path "D:\Users\*\AppData\Local\Temp\*" -Recurse -Force -ErrorAction SilentlyContinue

# Clear Windows Update cache
Remove-Item -Path "D:\Windows\SoftwareDistribution\Download\*" -Recurse -Force -ErrorAction SilentlyContinue

# Clear Windows Prefetch
Remove-Item -Path "D:\Windows\Prefetch\*" -Force -ErrorAction SilentlyContinue

# Clear user-level browser caches (Edge/Chrome)
Remove-Item -Path "D:\Users\*\AppData\Local\Microsoft\Edge\User Data\*\Cache\*" -Recurse -Force -ErrorAction SilentlyContinue
Remove-Item -Path "D:\Users\*\AppData\Local\Google\Chrome\User Data\*\Cache\*" -Recurse -Force -ErrorAction SilentlyContinue

Note: If the volume is a Windows system drive, avoid deleting files under D:\Windows\System32\config (registry hives) or D:\Windows\WinSxS (component store) as these are required for the OS to boot.

Run specialised recovery tools (if needed) (Windows)

If chkdsk reported unrecoverable errors, or if you need to recover deleted files or repair a corrupt partition table, use the following tools.

EC2Rescue for Windows Server — AWS-provided tool for diagnosing and repairing common Windows issues on EC2 instances. See Use EC2Rescue to troubleshoot EC2 Windows instance issues.

# Download EC2Rescue
Invoke-WebRequest -Uri "https://s3.amazonaws.com/ec2rescue/windows/EC2Rescue_latest.zip" -OutFile "$env:TEMP\EC2Rescue.zip"
Expand-Archive -Path "$env:TEMP\EC2Rescue.zip" -DestinationPath "$env:TEMP\EC2Rescue"

# Run EC2Rescue (interactive GUI)
& "$env:TEMP\EC2Rescue\EC2Rescue.exe"

EC2Rescue can fix boot configuration, restore registry hives from backup, and repair common OS-level issues.

Windows Recovery tools for corrupt registry:

If the volume is a Windows system drive with a corrupt registry, you can restore from the automatic backup. See Restore a corrupt registry on an EC2 Windows instance.

# Back up current registry hives
Copy-Item -Path "D:\Windows\System32\config\SYSTEM" -Destination "D:\Windows\System32\config\SYSTEM.bak"
Copy-Item -Path "D:\Windows\System32\config\SOFTWARE" -Destination "D:\Windows\System32\config\SOFTWARE.bak"

# Restore from RegBack (if available)
Copy-Item -Path "D:\Windows\System32\config\RegBack\SYSTEM" -Destination "D:\Windows\System32\config\SYSTEM" -Force
Copy-Item -Path "D:\Windows\System32\config\RegBack\SOFTWARE" -Destination "D:\Windows\System32\config\SOFTWARE" -Force

Note: Windows 10/Server 2019 and later no longer populate RegBack by default. If the directory is empty, EC2Rescue or a System Restore point is the alternative.

Third party options:

  • TestDisk for Windows — recovers lost NTFS/FAT partitions and repairs partition tables. Download from cgsecurity.org and run from the recovery instance:
  • PhotoRec for Windows (bundled with TestDisk) — recovers individual files from damaged NTFS/FAT volumes regardless of filesystem state:

Clean up

After you have finished with the recovery, unmount/offline the volume and clean up the temporary resources.

Linux — unmount:

sudo umount /mnt/recovery

Windows — take the disk offline:

Set-Disk -Number 1 -IsOffline $true

Detach, snapshot, and clean up (both platforms):

# Detach the volume
aws ec2 detach-volume \
  --region eu-west-1 \
  --volume-id vol-0abc123def456789

# Create a final snapshot of the cleaned volume
aws ec2 create-snapshot \
  --region eu-west-1 \
  --volume-id vol-0abc123def456789 \
  --description "Recovered and cleaned snapshot from us-east-1"

# Terminate the temporary instance
aws ec2 terminate-instances \
  --region eu-west-1 \
  --instance-ids i-0abc123def456789

# Delete the recovery volume (after snapshot is complete)
aws ec2 delete-volume \
  --region eu-west-1 \
  --volume-id vol-0abc123def456789

# Delete the local dump file if still on the instance
# rm /tmp/snapshot-dump.img

Related resources

AWS
EXPERT
published 2 months ago3.2K views