Extremely slow read performance of custom gp3 AMI image

0

I recently created an AMI using Packer, and when I attempt to use the AMI for a g4dn.2xlarge instance, the instance launches, but the read performance on the root volume is terrible - 5M.

iostat shows 5mbps reads, at ~30 reads/s, and the disk at 100% utilization. Writes to the disk are resonably fast. The whole point of the AMI was to load up some large files that will be needed, but loading the files now takes 10 minutes to load a 6GB file. I am hoping to make AMIs with many different sizes, so this is a big concern for me.

Some information on the AMI:

  • Built with Packer, using latest AWS Deep Learning Base AMI GPU CUDA 11 (Ubuntu 20.04) as the base image, ebs+hvm option
  • block device settings: --- device_name = "/dev/sda1" --- volume_size = 120 --- volume_type = "gp3" --- delete_on_termination = true

I've confirmed in the UI that the disk reads are terrible in instance monitoring as well as iostat, and I've confirmed that the root EBS volume that I've launched is gp3 in the console. What's going on and how can I fix this?

asked 9 months ago299 views
2 Answers
0

The low throughput and high latency you're experiencing most likely relates to the lazy loading of blocks from S3 when EBS volumes are restored from snapshots (including AMIs). See explanation of lazy loading below:

For volumes that were created from snapshots, the storage blocks must be pulled down from Amazon S3 and written to the volume before you can access them. This preliminary action takes time and can cause a significant increase in the latency of I/O operations the first time each block is accessed. Volume performance is achieved after all blocks have been downloaded and written to the volume.

Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html

How can you fix this? Review the recommendations as explained in this blog Addressing I/O latency when restoring Amazon EBS volumes from EBS Snapshots

  1. EBS initialization
  2. Fast snapshot restore (FSR)

There's pros and cons with both, i.e. EBS initialization is a manual action but free, and FSR instantly delivers all of the EBS provisioned performance but has a cost associated with it. You will have to decide which of the two options is more suitable.

AWS
AntAWS
answered 9 months ago
  • AntAWS, I suspected this might be the case, and I tested with FSR and had slightly better performance. (My startup scripts ran ~2x faster, still slow compared to my expectations, but application startup went from 10m to 5m.

    HOWEVER, I would like to release my AMIs into the marketplace, and FSR is very expensive and would also then need to be enabled in every region my AMI is. How does this work with Marketplace AMIs?

0

I would suggest trying the following:

  1. Without FSR, is the gp3 EBS volume close to hitting its IOPS or throughput limit? If so, increase the provision IOPS (max of 16K), and potentially the provisioned throughput (max of 1,000MiB/s), to see if this improve disk read performance on instance startup.
  2. Add a startup script to the AMI that pre-initializes the volume after the instance launches.
  3. Since write performance is not affected by lazy loading, try copying the 6GB file onto the instance (e.g from S3) via a startup script.

In terms of if or how FSR works with Marketplace AMIs. There doesn't appear to be a precedent for this. I'd suggest contacting AWS Support in relation to this query.

AWS
AntAWS
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions