Issue with NVIDIA Driver Installation on EC2 g4dn.xlarge Instance

0

Hello AWS Community,

I am encountering an issue while attempting to install the NVIDIA driver on my EC2 server, which is using the g4dn.xlarge instance type. Despite the g4dn.xlarge instance being equipped with NVIDIA GPU support, I am receiving the following error message during the driver installation process:

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

I have attempted to download and install the NVIDIA driver from https://www.nvidia.com/Download/driverResults.aspx/210649/en-us/. However, the installation process is encountering multiple errors:

  1. The installation begins with the message "Building kernel modules," which completes at 100% after 15 to 20 minutes.

  2. The next page displays the error message:

ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details.

  1. After clicking the OK button, the console proceeds to "Checking to see whether the nvidia kernel module was successfully built," but encounters another error:

ERROR: An error occurred while performing the step: "Checking to see whether the nvidia kernel module was successfully built". See /var/log/nvidia-installer.log for details.

  1. Upon pressing the OK button again, I receive the message:

ERROR: The nvidia kernel module was not created.

  1. The final page presents the error message:

ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

  1. I have seen the log file the below is log file error.

cc1: some warnings being treated as errors make[5]: *** [/tmp/selfgz8675/NVIDIA-Linux-x86_64-515.105.01/kernel/nvidia/i2c_nvswitch.o] Error 1 make[5]: Target __build' not remade because of errors. make[4]: *** [/tmp/selfgz8675/NVIDIA-Linux-x86_64-515.105.01/kernel] Error 2 make[4]: Target modules' not remade because of errors. make[3]: *** [modules] Error 2 make[2]: *** [__sub-make] Error 2 make[2]: Target modules' not remade because of errors. make[2]: Leaving directory /usr/src/kernels/5.10.186-179.751.amzn2.x86_64' make[1]: *** [modules] Error 2 make[1]: Leaving directory `/usr/src/kernels/5.10.186-179.751.amzn2.x86_64' make: *** [modules] Error 2 ERROR: The nvidia kernel module was not created. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

I have also attempted to install an alternative driver from https://www.nvidia.com/Download/driverResults.aspx/200625/en-us/, but encountered the Same issue.

Could someone please advise on how to resolve this NVIDIA driver installation problem on the g4dn.xlarge instance? Any assistance would be greatly appreciated.

1 Answer
1

Hi,

Look at this page which offers guidance for several options to install NVidia drivers on EC2 linux instances.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html

For g4dn.xlarge, you have 3 options available:

Instance type	Tesla driver	GRID driver	Gaming driver
G4dn	               Yes	         Yes	        Yes

Best,

Didier

profile pictureAWS
EXPERT
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions