Skip to content

No CUDA available on EC2 instance g5.xlarge

0

I am trying to test the instance I recently created but I'm not able to find a CUDA compatible GPU, even when the instance info say it has one A10G one from NVIDIA.

My python code: if torch.cuda.is_available(): device_index = torch.cuda.current_device() gpu_name = torch.cuda.get_device_name(device_index) print(f"Name of current GPU: {gpu_name}") else: print(f"No GPU availabe, using CPU")

In ubuntu, nvidia-smi is not installed.

3 Answers
2

Instead of installing NVIDIA driver manually, you can use AWS Deep Learning AMIs which comes with NVIDIA driver, CUDA toolkit, docker, other software and optionally pyTorch pre-installed. Search for Deep Learning OSS Nvidia Driver AMI GPU PyTorch when launching EC2 instance (screenshot below)

DLAMI search

Release notes at https://docs.aws.amazon.com/dlami/latest/devguide/aws-deep-learning-ami-gpu-pytorch-2.7-ubuntu-22-04.html

If you need graphical desktop, you can refer to Deep Learning graphical desktop on Ubuntu Linux with AWS Deep Learning AMI (DLAMI) or Deep Learning AMI

If you want to install NVIDIA driver manually, the article Install NVIDIA GPU driver, CUDA Toolkit, NVIDIA Container Toolkit on Amazon EC2 instances running Ubuntu Linux can help

AWS
EXPERT
answered a year ago
0

It sounds like you're experiencing an issue with CUDA not being available on your G5 instance. The G5 instances do come with NVIDIA A10G Tensor Core GPUs, but you need to install the necessary drivers and CUDA toolkit to use them.

For your Ubuntu-based G5 instance, you'll need to:

  1. Install the NVIDIA GPU driver
  2. Install the CUDA Toolkit
  3. Install any necessary libraries for PyTorch to detect the GPU

Here's how to install the NVIDIA driver and CUDA Toolkit on your Ubuntu instance:

First, update your system:

sudo apt-get update
sudo apt-get upgrade -y

Then install the NVIDIA driver:

sudo apt-get install -y nvidia-driver-latest

For the CUDA Toolkit, ensure you have at least 20 GB of free storage space, then install it:

cd /tmp
wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run
chmod +x ./cuda_*.run
sudo ./cuda_*.run --toolkit --silent

After installation, you'll need to restart your instance:

sudo reboot

Once your instance is back up, you should be able to run nvidia-smi to verify the driver installation. You may also need to set up environment variables for CUDA:

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Alternatively, you could consider using the AWS Deep Learning Base GPU AMI which comes pre-configured with CUDA (including version 12.6) and has added support specifically for G5 EC2 instances. This would save you the trouble of manual installation.
Sources
AWS Deep Learning Base GPU AMI (Amazon Linux 2023) - AWS Deep Learning AMIs
Install NVIDIA GPU driver, CUDA Toolkit, NVIDIA Container Toolkit on Amazon EC2 instances running Ubuntu Linux | AWS re:Post
Amazon EC2 G5 Instances | Amazon Web Services
How do I install NVIDIA GPU driver, CUDA Toolkit, NVIDIA Container Toolkit on Amazon EC2 instances running Amazon Linux 2 (AL2)? | AWS re:Post

answered a year ago
0

The command: sudo apt-get install -y nvidia-driver-latest

Will give this response: Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package nvidia-driver-latest

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.