Skip to content

Install NVIDIA GPU driver, CUDA Toolkit, NVIDIA Container Toolkit on Amazon EC2 instances running Ubuntu Linux

10 minute read
Content level: Expert
3

Steps to install NVIDIA driver, CUDA Toolkit, NVIDIA Container Toolkit, and other NVIDIA software from NVIDIA repository on Ubuntu 24.04 / 22.04 (x86_64/arm64)

Overview

This article suggests how to install NVIDIA GPU driver, CUDA Toolkit, NVIDIA Container Toolkit and other NVIDIA software directly from NVIDIA repository on NVIDIA GPU EC2 instances running Ubuntu on AWS.

Note that by using this method, you agree to NVIDIA Driver License Agreement, End User License Agreement and other related license agreement. If you are doing development, you may want to register for NVIDIA Developer Program.

Pre-built AMIs

If you need AMIs preconfigured with NVIDIA GPU driver, CUDA, other NVIDIA software, and optionally PyTorch or TensorFlow framework, consider AWS Deep Learning AMIs. Refer to Release notes for DLAMIs for currently supported options, and Deep Learning graphical desktop on Ubuntu Linux with AWS Deep Learning AMI (DLAMI) for graphical desktop setup guidance.

For container workloads, consider Amazon ECS-optimized Linux AMIs and Amazon EKS optimized AMIs

Note: instructions in this article are not applicable to pre-built AMIs.

Custom ECS GPU-optimized AMI

If you wish to build your own custom Amazon ECS GPU-optimized AMI, install NVIDIA driver, Docker and NVIDIA container toolkit, and refer to How do I create and use custom AMIs in Amazon ECS? and Installing the Amazon ECS container agent

About CUDA toolkit

CUDA Toolkit is generally optional when GPU instance is used to run applications (as opposed to develop applications) as the CUDA application typically packages (by statically or dynamically linking against) the CUDA runtime and libraries needed.

System Requirements

NVIDIA CUDA supports the following platforms

  • Ubuntu Linux 24.04 (x86_64 and arm64)
  • Ubuntu Linux 22.04 (x86_64 and arm64)

Refer to Driver installation guide for supported kernel versions, compilers and libraries.

Prepare Ubuntu Linux

Launch a new NVIDIA GPU instance running Ubuntu Linux preferably with at least 20 GB storage and connect to the instance

Update OS, and install DKMS, kernel headers and development packages

sudo apt update
sudo apt upgrade -y
sudo apt autoremove -y
sudo apt install -y dkms linux-headers-aws linux-modules-extra-aws unzip gcc make libglvnd-dev pkg-config

Restart your EC2 instance if kernel is updated

Add NVIDIA repository

Configure Network Repo installation

DISTRO=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
if (arch | grep -q x86); then
  ARCH=x86_64
else
  ARCH=sbsa
fi
cd /tmp
curl -L -O https://developer.download.nvidia.com/compute/cuda/repos/$DISTRO/$ARCH/cuda-keyring_1.1-1_all.deb
sudo apt install -y ./cuda-keyring_1.1-1_all.deb
sudo apt update 

Install NVIDIA Driver

To install latest Tesla driver

sudo apt install -y nvidia-open nvidia-xconfig

To install a specific version, e.g. 570

sudo apt install -y nvidia-open-570

The above install NVIDIA Open-source kernel module. Refer to Driver Installation Guide about NVIDIA Kernel Modules and installation options.

Verify

Restart your instance

nvidia-smi

Output should be similar to below

Sat Apr 19 02:54:25 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:1E.0 Off |                    0 |
| N/A   26C    P8             13W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Optional: CUDA Toolkit

To install latest CUDA Toolkit

sudo apt install -y cuda-toolkit

To install a specific version, e.g. 12.8

sudo apt install -y cuda-toolkit-12-8

Refer to CUDA Toolkit documentation about supported platforms and installation options.

Verify

/usr/local/cuda/bin/nvcc -V

Output should be similar to below

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

Post-installation Actions

Refer to NVIDIA CUDA Installation Guide for Linux for post-installation actions before CUDA Toolkit can be used. For example, you may want to include /usr/local/cuda/bin to your PATH variable as per Post-installation Actions: Mandatory Actions

Optional: NVIDIA Container Toolkit

NVIDIA Container toolkit supports Ubuntu on both x86_64 and arm64. For arm64, use g5g.2xlarge or larger instance size as g5g.xlarge may cause failures due to the limited system memory.

To install latest NVIDIA Container Toolkit

sudo apt install -y nvidia-container-toolkit

Refer to NVIDIA Container toolkit documentation about supported platforms, prerequisites and installation options

Verify

nvidia-container-cli -V

Output should be similar to below

cli-version: 1.17.5
lib-version: 1.17.5
build date: 2025-03-07T15:46+00:00
build revision: f23e5e55ea27b3680aef363436d4bcf7659e0bfc
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

Container engine configuration

Refer to NVIDIA Container Toolkit documentation about container engine configuration.

Install and configure Docker

To install and configure docker

sudo apt install -y docker.io
sudo usermod -aG docker ubuntu
sudo systemctl enable docker

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify Docker engine configuration

To verify docker configuration

sudo docker run --rm --runtime=nvidia --gpus all public.ecr.aws/ubuntu/ubuntu:latest nvidia-smi

Output should be similar to below

Unable to find image 'public.ecr.aws/ubuntu/ubuntu:latest' locally
latest: Pulling from ubuntu/ubuntu
440a90d6b31c: Pull complete 
Digest: sha256:53b9e8d5b40d75d40e41b8776e468b0f7713ca3604e78981be28f0ba9843a316
Status: Downloaded newer image for public.ecr.aws/ubuntu/ubuntu:latest
Sat Apr 19 02:56:02 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:1E.0 Off |                    0 |
| N/A   22C    P8              9W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Install on EC2 instance at launch

To install NVIDIA driver and NVIDIA container toolkit including Docker when launching a new GPU instance with at least 20 GB storage, you can use the following as user data script. Uncomment line ending with cuda-toolkit if you want to install CUDA toolkit.

#!/bin/bash
export DEBIAN_FRONTEND=noninteractive
sudo apt update
sudo apt upgrade -y
sudo apt autoremove -y

sudo apt install -y dkms linux-headers-aws linux-modules-extra-aws unzip gcc make libglvnd-dev pkg-config

DISTRO=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
if (arch | grep -q x86); then
  ARCH=x86_64
else
  ARCH=sbsa
fi
cd /tmp
curl -L -O https://developer.download.nvidia.com/compute/cuda/repos/$DISTRO/$ARCH/cuda-keyring_1.1-1_all.deb
sudo apt install -y ./cuda-keyring_1.1-1_all.deb
sudo apt update

sudo apt install -y nvidia-open nvidia-xconfig

# sudo apt install -y cuda-toolkit

sudo apt install -y docker.io
sudo usermod -aG docker ubuntu
sudo systemctl enable docker

sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

sudo reboot

Verify

Connect to your EC2 instance.

nvidia-smi
/usr/local/cuda/bin/nvcc -V
nvidia-container-cli -V
sudo docker run --rm --runtime=nvidia --gpus all public.ecr.aws/ubuntu/ubuntu:latest nvidia-smi

View /var/log/cloud-init-output.log to troubleshoot any installation issues.

Perform post-installation actions in order to use CUDA toolkit. To verify integrity of installation, you can download, compile and run CUDA samples such as deviceQuery.

Ubuntu Linux 24.04 on g4dn

If Docker and NVIDIA container toolkit (but not CUDA toolkit) are installed and configured, you can use CUDA samples container image to validate CUDA driver.

sudo docker run --rm --runtime=nvidia --gpus all nvcr.io/nvidia/k8s/cuda-sample:devicequery

Ubuntu CUDA driver

GUI (graphical desktop) remote access

If you need remote graphical desktop access, refer to Install GUI (graphical desktop) on Amazon EC2 instances running Ubuntu Linux

Note that this article installs NVIDIA Tesla driver (also know as NVIDIA Datacenter Driver), which is intended primarily for GPU compute workloads. If configured in xorg.conf, Tesla drivers support one display of up to 2560x1600 resolution.

GRID drivers provide access to four 4K displays per GPU and are certified to provide optimal performance for professional visualization applications. AMIs preconfigured with GRID drivers are available from AWS Marketplace. You can also consider using amazon-ec2-nice-dcv-samples CloudFormation templates to provision your own EC2 instances with either NVIDIA Tesla or GRID driver, Docker with NVIDIA Container Toolkit, graphical desktop environment and Amazon DCV remote display protocol server.

Other software

AWS CLI

To install AWS CLI (AWS Command Line Interface) v2 through Snap

sudo snap install aws-cli --classic

Verify

aws --version

Output should be similar to below

aws-cli/2.26.5 Python/3.13.2 Linux/6.8.0-1027-aws exe/x86_64.ubuntu.24

cuDNN (CUDA Deep Neural Network library)

To install cuDNN for the latest available CUDA version.

sudo apt install -y zlib1g cudnn

Refer to cuDNN documentation about installation options and support matrix

NCCL (NVIDIA Collective Communication Library)

To install latest NCCL

sudo apt install -y libnccl2 libnccl-dev

Refer to NCCL documentation about installation options

DCGM (NVIDIA Data Center GPU Manager)

To install latest DCGM

sudo apt install -y datacenter-gpu-manager

Refer to DCGM documentation for more information

Verify

dcgmi -v

Output should be similar to below

Version : 3.3.8
Build ID : 43
Build Date : 2024-09-03
Build Type : Release
Commit ID : be8d66b4318e1d5d6e31b67759dc924d1bc18681
Branch Name : rel_dcgm_3_3
CPU Arch : aarch64
Build Platform : Linux 4.15.0-180-generic #189-Ubuntu SMP Wed May 18 14:13:57 UTC 2022 x86_64
CRC : 93724fdcffc34a2656865a161c2d79df

NVIDIA GPUDirect Storage

To install NVIDIA Magnum IO GPUDirect® Storage (GDS) and libcufile

sudo apt install -y nvidia-gds

To install GDS only

sudo apt install -y nvidia-fs

Reboot

Reboot after installation is complete

sudo reboot

Verify

To verify installation

lsmod | grep nvidia_fs

Output should be similar to below

nvidia_fs             262144  0
nvidia              11481088  3 nvidia_uvm,nvidia_fs,nvidia_modeset

If nvidia-gds meta-package is installed

/usr/local/cuda/gds/tools/gdscheck -p

Output should be similar to below

GDS release version: 1.14.0.30
libcufile version: 2.12
Platform: x86_64
...
...
==============
PLATFORM INFO:
==============
IOMMU: disabled
Nvidia Driver Info Status: Supported(Nvidia Open Driver Installed)
Cuda Driver Version Installed:  12090
Platform: g4dn.xlarge, Arch: x86_64(Linux 6.8.0-1030-aws)
Platform verification succeeded

Refer to GDS documentation and Driver installation guide for more information

Fabric Manager

To install latest Fabric Manager and driver

sudo apt install -y cuda-drivers-fabricmanager

To install specific version, e.g. 565

sudo apt install -y cuda-drivers-fabricmanager-565

Refer to Fabric Manager documentation for supported platforms and installation options

Verify

nv-fabricmanager -v

Output should be similar to below

Fabric Manager version is : 565.57.01