CannotPullContainerError: no space left on device

0

Since today I have been seeing several failures on batch jobs due to the following error:

CannotPullContainerError: failed to register layer: Error processing tar file(exit status 1): write /opt/conda/lib/libmkl_avx512.so: no space left on device

I am using the standard AWS ECS AMI, and my compute environment is set up to use 'm4 family' so I would expect it to choose an instance that has enough resources.

Is there anything I can do to prevent this?

4TufZB
질문됨 4년 전2225회 조회
2개 답변
0

Hello,

In order to fix this issue, you need to update the dm.base value in the docker configuration. Please refer this article to achieve the same.

https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-failure-disk-space/

AWS
답변함 4년 전
profile picture
전문가
검토됨 24일 전
0

Thanks Anish, that helped. For anyone also having trouble with this, you need to

  1. increase the size of the disk available to docker (/dev/xvdcz) in the launch template
  2. increase the disk quota available to each container in the launch template, I used:
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==BOUNDARY=="

--==BOUNDARY==
Content-Type: text/cloud-boothook; charset="us-ascii"
#cloud-boothook
#!/bin/bash
cloud-init-per once docker_options echo 'OPTIONS="${OPTIONS} --storage-opt dm.basesize=30G"' >> /etc/sysconfig/docker

--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
# more aggressive settings for cleaning up old tasks
echo ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1m >> /etc/ecs/ecs.config
echo ECS_IMAGE_CLEANUP_INTERVAL=1m >> /etc/ecs/ecs.config
echo ECS_IMAGE_MINIMUM_CLEANUP_AGE=1m >> /etc/ecs/ecs.config

--==BOUNDARY==--
4TufZB
답변함 4년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인