Why is the GPU not working out of the box for Deep learning AMI EC2 instance?


I'm having trouble using the GPU for a Deep learning GPU EC2 instance. The specs of the instance are:

  • Deep Learning AMI GPU PyTorch 1.11.0 (Amazon Linux 2) 20220328
  • amazon/Deep Learning AMI GPU PyTorch 1.11.0 (Amazon Linux 2) 20220328

When I log into the instance and I run nvidia smi, I get the error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Similarly, when I run a pytorch (pre-installed) command to check whether it can see a GPU, it returns False:

(pytorch) [ec2-user@ip-172-31-86-58 ~]$ python3 Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

>>> import torch

>>> torch.cuda.is_available()


The GPU set up should have worked out of the box but how do I fix this?

1 Answers


Thank you for contacting us and for using AWS Deep learning AMI.

Please note that such issues in the nvidia-smi command can generally occur when an unsupported instance type for the Deep Learning AMI GPU PyTorch 1.11.0 (Amazon Linux 2) 20220328 is used.

As per , the supported instance types for this AMI are -> G3, P3, P3dn, P4d, G5, G4dn

I was getting same issues which you mentioned when I tried using an unsupported ec2 instance type. Switching to the supported instances resolved the issue for me. Can you try the same at your end ?

If required further help, I would recommend you to provide more information about your Deep learning AMI EC2 instance by opening a support case as we cannot discuss account specific issue in the public posts. You can open a support case with AWS using the link:

Thanks :)

answered 4 months ago

