AWS Batch GPU busy or unavailable

I'm trying to deploy a Python app in a Docker container that utilizes CUDA to AWS Batch. When I try to run a Batch Job I get this error:

RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable

I'm a bit confused as I thought AWS Batch would assign an EC2 instance with an available GPU. I request at least 1 GPU when I submit a job. Haven't had any luck finding anyone with the same issue. It's possible I messed up configuring something in my Dockerfile or in AWS Batch, but it sounds like I'm correctly accessing the GPU and something on AWS' end is messed up. Let me know if you need any other info from me.

Docker environment: nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04

Compute Environment: p2-family EC2s (not spot instances)

Topics

Compute

Relevant content

AWS BATCH GPU Startup latency
hansie-13
asked a year ago
AWS Batch - Error while running a job
Swami S
asked 3 days ago
ResourceInitializationError when running a job in AWS Batch
ianliu
asked 2 years ago
Can I Run a Docker Container for Batch Processing from AWS Glue
barmanroys
asked 5 months ago
How do I run an AWS Batch job using Java?
AWS OFFICIALUpdated 2 years ago
How do I resolve the "DockerTimeoutError" error in AWS Batch?
AWS OFFICIALUpdated 3 days ago
How do I fix a compute environment that's not valid in AWS Batch?
AWS OFFICIALUpdated 2 years ago
How do I pass parameters from a scheduled trigger in EventBridge to an AWS Batch job?
AWS OFFICIALUpdated 2 years ago
How to trigger an AWS Mainframe Modernization batch job from Amazon EventBridge Scheduler?
EXPERT
Souma Suvra Ghosh
published 4 months ago
Build an AWS Mainframe Modernization batch scheduler using AWS Step Functions
EXPERT
Souma Suvra Ghosh
published 2 months ago