AWS Batch GPU busy or unavailable

0

I'm trying to deploy a Python app in a Docker container that utilizes CUDA to AWS Batch. When I try to run a Batch Job I get this error:

RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable

I'm a bit confused as I thought AWS Batch would assign an EC2 instance with an available GPU. I request at least 1 GPU when I submit a job. Haven't had any luck finding anyone with the same issue. It's possible I messed up configuring something in my Dockerfile or in AWS Batch, but it sounds like I'm correctly accessing the GPU and something on AWS' end is messed up. Let me know if you need any other info from me.

Docker environment: nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04

Compute Environment: p2-family EC2s (not spot instances)

jlin
已提問 2 年前檢視次數 59 次
沒有答案

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南