Why is the GPU not working out of the box for Deep learning AMI EC2 instance?

0

I'm having trouble using the GPU for a Deep learning GPU EC2 instance. The specs of the instance are:

  • Deep Learning AMI GPU PyTorch 1.11.0 (Amazon Linux 2) 20220328
  • amazon/Deep Learning AMI GPU PyTorch 1.11.0 (Amazon Linux 2) 20220328

When I log into the instance and I run nvidia smi, I get the error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Similarly, when I run a pytorch (pre-installed) command to check whether it can see a GPU, it returns False:

(pytorch) [ec2-user@ip-172-31-86-58 ~]$ python3 Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

>>> import torch

>>> torch.cuda.is_available()

False

The GPU set up should have worked out of the box but how do I fix this?

asked 2 years ago5620 views
3 Answers
2

Hello,

Thank you for contacting us and for using AWS Deep learning AMI.

Please note that such issues in the nvidia-smi command can generally occur when an unsupported instance type for the Deep Learning AMI GPU PyTorch 1.11.0 (Amazon Linux 2) 20220328 is used.

As per https://aws.amazon.com/releasenotes/aws-deep-learning-ami-gpu-pytorch-1-11-amazon-linux-2/ , the supported instance types for this AMI are -> G3, P3, P3dn, P4d, G5, G4dn

I was getting same issues which you mentioned when I tried using an unsupported ec2 instance type. Switching to the supported instances resolved the issue for me. Can you try the same at your end ?

If required further help, I would recommend you to provide more information about your Deep learning AMI EC2 instance by opening a support case as we cannot discuss account specific issue in the public posts. You can open a support case with AWS using the link:

https://console.aws.amazon.com/support/home?#/case/create

Thanks :)

AWS
SUPPORT ENGINEER
answered 2 years ago
  • Hey, I have the same problem; I have an g3s.xlarge instance on eu-west-2 with the image Deep Learning OSS Nvidia Driver AMI GPU PyTorch 1.13.1 (Amazon Linux 2) 20240123. When you try "nvidia-smi" you get the error "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running." And when trying to use pytorch no GPUs are discovered.

1

Yup same issue here. Documentation is inconsistent:

Deep Learning Proprietary Nvidia Driver AMI GPU TensorFlow 2.13 (Amazon Linux 2)

Please use the following command to activate the TensorFlow 2.13 Pip Environment:

TensorFlow 2.13 ____________________ source /opt/tensorflow/bin/activate

  • Supported EC2 instances: P3, P3dn, G3, G5, G4dn.
  • For P4 EC2 instances, please use OSS Nvidia Driver DLAMI.
  • EC2 P2 Instance is not supported on current DLAMI.

NVIDIA driver version: 535.129.03 CUDA version: 11.8

AWS Deep Learning AMI Homepage: https://aws.amazon.com/machine-learning/amis/ Release Notes: https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html Support: https://forums.aws.amazon.com/forum.jspa?forumID=263 For a fully managed experience, check out Amazon SageMaker at https://aws.amazon.com/sagemaker Security scan reports for python packages are located at: /opt/aws/dlami/info/

source /opt/tensorflow/bin/activate

Error: Note that the Amazon EC2 g4dn.2xlarge instance type is not supported by current Deep Learning AMI.

Please refer the DLAMI release notes https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html for more information.

answered 3 months ago
0

I have used multiple instance for with Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.1.0 (Ubuntu 20.04) 20240116 Have requested quota for G and P instance types. And get same problem!

Andrew
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions