CannotPullContainerError: no space left on device

0

Since today I have been seeing several failures on batch jobs due to the following error:

CannotPullContainerError: failed to register layer: Error processing tar file(exit status 1): write /opt/conda/lib/libmkl_avx512.so: no space left on device

I am using the standard AWS ECS AMI, and my compute environment is set up to use 'm4 family' so I would expect it to choose an instance that has enough resources.

Is there anything I can do to prevent this?

4TufZB
asked 4 years ago2124 views
2 Answers
0

Hello,

In order to fix this issue, you need to update the dm.base value in the docker configuration. Please refer this article to achieve the same.

https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-failure-disk-space/

AWS
answered 4 years ago
0

Thanks Anish, that helped. For anyone also having trouble with this, you need to

  1. increase the size of the disk available to docker (/dev/xvdcz) in the launch template
  2. increase the disk quota available to each container in the launch template, I used:
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==BOUNDARY=="

--==BOUNDARY==
Content-Type: text/cloud-boothook; charset="us-ascii"
#cloud-boothook
#!/bin/bash
cloud-init-per once docker_options echo 'OPTIONS="${OPTIONS} --storage-opt dm.basesize=30G"' >> /etc/sysconfig/docker

--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
# more aggressive settings for cleaning up old tasks
echo ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1m >> /etc/ecs/ecs.config
echo ECS_IMAGE_CLEANUP_INTERVAL=1m >> /etc/ecs/ecs.config
echo ECS_IMAGE_MINIMUM_CLEANUP_AGE=1m >> /etc/ecs/ecs.config

--==BOUNDARY==--
4TufZB
answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions