GPU fails to intialize on g5.xlarge instance

0

Hello,

I have tried to create several g5.xlarge innstance with various AMI "quickstart" (Deep Learning AMI GPU TensorFlow 2.7.0 (Amazon Linux 2) 20211111 - ami-0850c76a5926905fb, Deep Learning AMI (Ubuntu 18.04) Version 54.0, ...)

In all cases, the instances is booting OK. Status checks are both OK, but the GPU is not accessible.

For example with AMI (Ubuntu 18.04) Version 54.0

nvidia-smi gives the error

nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

With 'dmesg' we can see the following errors:

[  308.148743] nvidia: probe of 0000:00:1e.0 failed with error -1
[  308.148756] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  308.148756] NVRM: None of the NVIDIA devices were initialized.
[  308.148969] nvidia-nvlink: Unregistered the Nvlink Core, major device number 239

The nvidia drivers installed are

apt list --installed | grep -i nvidia

libnvidia-container-tools/bionic,now 1.7.0-1 amd64 [installed,automatic]
libnvidia-container1/bionic,now 1.7.0-1 amd64 [installed,automatic]
nvidia-container-toolkit/bionic,now 1.7.0-1 amd64 [installed]
nvidia-docker2/bionic,now 2.8.0-1 all [installed]
nvidia-fabricmanager-450/now 450.142.00-1 amd64 [installed,upgradable to: 450.156.00-0ubuntu0.18.04.1]

The driver are not updated when doing a system update (i tried to unhold the package, update the system but it does not solve the issue)

apt-mark showhold
linux-aws
linux-headers-aws
linux-image-aws
nvidia-fabricmanager-450
tensorflow-model-server-neuron

Any idea of what i could try to solve the issue ?

Or do you know another Deep Learning AMI image that would work fine with this g5.xlarge ?

Thanks !

demandé il y a 2 ans5026 vues
1 réponse
1

For EC2 G5 instances, you will need to use a Deep Learning AMI with CUDA 11.4 or later. References to those can be found in the Deep Learning AMI documentation.

AWS
EXPERT
répondu il y a 2 ans
profile pictureAWS
EXPERT
vérifié il y a 2 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions