GPU fails to intialize on g5.xlarge instance

0

Hello,

I have tried to create several g5.xlarge innstance with various AMI "quickstart" (Deep Learning AMI GPU TensorFlow 2.7.0 (Amazon Linux 2) 20211111 - ami-0850c76a5926905fb, Deep Learning AMI (Ubuntu 18.04) Version 54.0, ...)

In all cases, the instances is booting OK. Status checks are both OK, but the GPU is not accessible.

For example with AMI (Ubuntu 18.04) Version 54.0

nvidia-smi gives the error

nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

With 'dmesg' we can see the following errors:

[  308.148743] nvidia: probe of 0000:00:1e.0 failed with error -1
[  308.148756] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  308.148756] NVRM: None of the NVIDIA devices were initialized.
[  308.148969] nvidia-nvlink: Unregistered the Nvlink Core, major device number 239

The nvidia drivers installed are

apt list --installed | grep -i nvidia

libnvidia-container-tools/bionic,now 1.7.0-1 amd64 [installed,automatic]
libnvidia-container1/bionic,now 1.7.0-1 amd64 [installed,automatic]
nvidia-container-toolkit/bionic,now 1.7.0-1 amd64 [installed]
nvidia-docker2/bionic,now 2.8.0-1 all [installed]
nvidia-fabricmanager-450/now 450.142.00-1 amd64 [installed,upgradable to: 450.156.00-0ubuntu0.18.04.1]

The driver are not updated when doing a system update (i tried to unhold the package, update the system but it does not solve the issue)

apt-mark showhold
linux-aws
linux-headers-aws
linux-image-aws
nvidia-fabricmanager-450
tensorflow-model-server-neuron

Any idea of what i could try to solve the issue ?

Or do you know another Deep Learning AMI image that would work fine with this g5.xlarge ?

Thanks !

已提問 2 年前檢視次數 5026 次
1 個回答
1

For EC2 G5 instances, you will need to use a Deep Learning AMI with CUDA 11.4 or later. References to those can be found in the Deep Learning AMI documentation.

AWS
專家
已回答 2 年前
profile pictureAWS
專家
已審閱 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南