By using AWS re:Post, you agree to the Terms of Use

GPU fails to intialize on g5.xlarge instance

0

Hello,

I have tried to create several g5.xlarge innstance with various AMI "quickstart" (Deep Learning AMI GPU TensorFlow 2.7.0 (Amazon Linux 2) 20211111 - ami-0850c76a5926905fb, Deep Learning AMI (Ubuntu 18.04) Version 54.0, ...)

In all cases, the instances is booting OK. Status checks are both OK, but the GPU is not accessible.

For example with AMI (Ubuntu 18.04) Version 54.0

nvidia-smi gives the error

nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

With 'dmesg' we can see the following errors:

[  308.148743] nvidia: probe of 0000:00:1e.0 failed with error -1
[  308.148756] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  308.148756] NVRM: None of the NVIDIA devices were initialized.
[  308.148969] nvidia-nvlink: Unregistered the Nvlink Core, major device number 239

The nvidia drivers installed are

apt list --installed | grep -i nvidia

libnvidia-container-tools/bionic,now 1.7.0-1 amd64 [installed,automatic]
libnvidia-container1/bionic,now 1.7.0-1 amd64 [installed,automatic]
nvidia-container-toolkit/bionic,now 1.7.0-1 amd64 [installed]
nvidia-docker2/bionic,now 2.8.0-1 all [installed]
nvidia-fabricmanager-450/now 450.142.00-1 amd64 [installed,upgradable to: 450.156.00-0ubuntu0.18.04.1]

The driver are not updated when doing a system update (i tried to unhold the package, update the system but it does not solve the issue)

apt-mark showhold
linux-aws
linux-headers-aws
linux-image-aws
nvidia-fabricmanager-450
tensorflow-model-server-neuron

Any idea of what i could try to solve the issue ?

Or do you know another Deep Learning AMI image that would work fine with this g5.xlarge ?

Thanks !

asked 9 months ago1265 views
1 Answer
1

For EC2 G5 instances, you will need to use a Deep Learning AMI with CUDA 11.4 or later. References to those can be found in the Deep Learning AMI documentation.

EXPERT
answered 9 months ago
profile picture
EXPERT
reviewed 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions