GPU fails to intialize on g5.xlarge instance

0

Hello,

I have tried to create several g5.xlarge innstance with various AMI "quickstart" (Deep Learning AMI GPU TensorFlow 2.7.0 (Amazon Linux 2) 20211111 - ami-0850c76a5926905fb, Deep Learning AMI (Ubuntu 18.04) Version 54.0, ...)

In all cases, the instances is booting OK. Status checks are both OK, but the GPU is not accessible.

For example with AMI (Ubuntu 18.04) Version 54.0

nvidia-smi gives the error

nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

With 'dmesg' we can see the following errors:

[  308.148743] nvidia: probe of 0000:00:1e.0 failed with error -1
[  308.148756] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  308.148756] NVRM: None of the NVIDIA devices were initialized.
[  308.148969] nvidia-nvlink: Unregistered the Nvlink Core, major device number 239

The nvidia drivers installed are

apt list --installed | grep -i nvidia

libnvidia-container-tools/bionic,now 1.7.0-1 amd64 [installed,automatic]
libnvidia-container1/bionic,now 1.7.0-1 amd64 [installed,automatic]
nvidia-container-toolkit/bionic,now 1.7.0-1 amd64 [installed]
nvidia-docker2/bionic,now 2.8.0-1 all [installed]
nvidia-fabricmanager-450/now 450.142.00-1 amd64 [installed,upgradable to: 450.156.00-0ubuntu0.18.04.1]

The driver are not updated when doing a system update (i tried to unhold the package, update the system but it does not solve the issue)

apt-mark showhold
linux-aws
linux-headers-aws
linux-image-aws
nvidia-fabricmanager-450
tensorflow-model-server-neuron

Any idea of what i could try to solve the issue ?

Or do you know another Deep Learning AMI image that would work fine with this g5.xlarge ?

Thanks !

已提问 2 年前5026 查看次数
1 回答
1

For EC2 G5 instances, you will need to use a Deep Learning AMI with CUDA 11.4 or later. References to those can be found in the Deep Learning AMI documentation.

AWS
专家
已回答 2 年前
profile pictureAWS
专家
已审核 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则