Slow IOPS on EC2 batch job

1

Hi, we're facing an issue when using AWS Batch with EC2 instances due to slow iops when running the job from inside a docker container. Bellow we collected some evidences with the details:

Instance Type: m6idn.large /rocketpva/ directory is mapped to the instance store disk According to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/general-purpose-instances.html on the m6idn.large instance type we can get up to 33.5k iops.

When running a benchmark program (fio) to simulate our application workload we got the following results:

  • outside docker container: throughput: +/- 136MiB/s, iops=34009
  • inside docker container: throughput: +/- 11.8MiB/s, iops=3023

As we can see, when running from a docker container we got nearly 10 times less iops and throughput.

Is there anything we can do the fully utilize the instance store disk from inside a docker container using AWS Batch?

Detinho
asked 10 months ago298 views
1 Answer
0

Greetings !

I understand that you have noticed differences in IOPS availability inside the docker container of a Batch job and on the EC2 host where the job is executing. You can correct me if this understanding is incorrect.

Theoretically, there should not be a difference, because by default the storage available to the underlying EC2 instance is available as it is to the containers running on it through a storage driver, as has been highlighted in this thread [1].

I tried to replicate this in my internal environment, where I launched an Alpine job and compared the fio output between an Alpine docker container and the underlying host as mentioned here [2]. However, for a compute environment running m4.large instances, I could see ~3100 IOPS on average for both the host as well as the container.

Having said that, we wish to help you further in the goal that you are trying to achieve and we require details that are non-public information and more specific to your account such as the Compute Environment Name, the AMI that you are using amongst other information like steps/commands used to perform the bench marking.

Please open a support case with AWS using the following link [3] with the AWS Batch team and we will be glad to help resolve this issue further.

Have a great day !

[1] https://github.com/moby/moby/issues/21485 [2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/benchmark_procedures.html [3] https://console.aws.amazon.com/support/home#/case/create

AWS
SUPPORT ENGINEER
answered 10 months ago
  • Hi. Thanks for your answer. We're using m6idn.large (and other "d" instance types) so we can use the instance storage and have more performant IOPS. As you mentioned you tested on an m4.large instance, I assume you ran the benchmark using an EBS disk. If so, it's a different scenario. In any case, we opened a case and the support team was able to replicate the same numbers we got, so maybe there is some kind of corner case here. Thanks for you answer.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions