Purchased Amazon Linux 2 AMI with NVIDIA TESLA GPU Driver But ML Python App Not Using It

0

Greeting...

I purchased Amazon Linux 2 AMI with NVIDIA TESLA GPU Driver from aws marketplace. However, after I launched an EC2 Amazon Linux2 instance with the subscribed AMI, It is NOT WORKING as expected.

When I run a machine learning python application, I got message: .... The NVIDIA driver on your system is too old (found version 11000). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 Using cpu for inference. ....

The reason I purchased this AMI is for utilizing NVIDIA GPU Driver and CUDA to support my python ML application. However, the NVIDIA Driver in the AMI seems "TOO OLD" as the message shows above..

What I expect is when I run the ML python app on the EC2 with the AMI I purchased from AWS marketplace, I should see: Using CUDA for inference....., Not the current message: Using CPU for inference.

Here some detail about the env. of the EC2:

Amazon Linux2 EC2 /w the purchased AMI w/ NVIDIA Driver.

$ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.236.01 Driver Version: 450.236.01 CUDA Version: 11.0

[ec2-user@ip-10-0-0-44 ~]$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Wed_Apr_11_23:16:29_CDT_2018 Cuda compilation tools, release 9.2, V9.2.88

]$ python3.11 --version Python 3.11.3

torch.version '2.1.0+cu121'

Given the above env info, What I expect is when I run the ML python app on the EC2 with the API I purchased from AWS marketplace, I should see: Using CUDA for inference....., Not the current message: Using CPU for inference.

Any advice on how to resolve this issue?

Thank you!

1 Answer
1
Accepted Answer

Hello,

The AL2 AMI comes with Cuda version 11.

# nvidia-smi -q | head

==============NVSMI LOG==============

Timestamp                                 : Tue Oct 24 06:30:45 2023
Driver Version                            : 450.236.01
CUDA Version                              : 11.0

Attached GPUs                             : 1
GPU 00000000:00:1E.0
    Product Name                          : Tesla T4

Python version 3.7. However, you can upgrade to python version 3.8 using the steps here: https://repost.aws/questions/QUtA3qNBaLSvWPfD5kFwI0_w/python-3-10-on-ec2-running-amazon-linux-2-and-the-openssl-upgrade-requirement#ANJdWAfw8AQWmZyUh0RUjDVA

I installed a pytorch version that matches my CUDA version https://pytorch.org/get-started/previous-versions/

Then I verified using following commands

python3 --version
Python 3.8.16

# python3
Python 3.8.16 (default, Aug 30 2023, 23:19:34)
[GCC 7.3.1 20180712 (Red Hat 7.3.1-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'Tesla T4'

I believe you are facing this issue because you have a pytorch version installed that is not compatible with CUDA 11. Please install a version that is compatible.

If you would like to use a different version of CUDA, please follow the steps here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html#gpu-instance-install-cuda

If you would like to completely update the NVIDIA drivers, first completely remove the existing NVIDIA drivers using the command:

sudo yum erase nvidia cuda

Then install using the steps provided here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html#nvidia-GRID-driver

profile pictureAWS
SUPPORT ENGINEER
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions