CannotPullContainerError: no space left on device

0

Since today I have been seeing several failures on batch jobs due to the following error:

CannotPullContainerError: failed to register layer: Error processing tar file(exit status 1): write /opt/conda/lib/libmkl_avx512.so: no space left on device

I am using the standard AWS ECS AMI, and my compute environment is set up to use 'm4 family' so I would expect it to choose an instance that has enough resources.

Is there anything I can do to prevent this?

4TufZB
質問済み 4年前2226ビュー
2回答
0

Hello,

In order to fix this issue, you need to update the dm.base value in the docker configuration. Please refer this article to achieve the same.

https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-failure-disk-space/

AWS
回答済み 4年前
profile picture
エキスパート
レビュー済み 24日前
0

Thanks Anish, that helped. For anyone also having trouble with this, you need to

  1. increase the size of the disk available to docker (/dev/xvdcz) in the launch template
  2. increase the disk quota available to each container in the launch template, I used:
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==BOUNDARY=="

--==BOUNDARY==
Content-Type: text/cloud-boothook; charset="us-ascii"
#cloud-boothook
#!/bin/bash
cloud-init-per once docker_options echo 'OPTIONS="${OPTIONS} --storage-opt dm.basesize=30G"' >> /etc/sysconfig/docker

--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
# more aggressive settings for cleaning up old tasks
echo ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1m >> /etc/ecs/ecs.config
echo ECS_IMAGE_CLEANUP_INTERVAL=1m >> /etc/ecs/ecs.config
echo ECS_IMAGE_MINIMUM_CLEANUP_AGE=1m >> /etc/ecs/ecs.config

--==BOUNDARY==--
4TufZB
回答済み 4年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ