Install NVIDIA GPU driver, CUDA Toolkit, NVIDIA Container Toolkit on Amazon EC2 instances running Ubuntu Linux
Steps to install NVIDIA driver, CUDA Toolkit, NVIDIA Container Toolkit, and other NVIDIA software from NVIDIA repository on Ubuntu 24.04 / 22.04 (x86_64/arm64)
Overview
This article suggests how to install NVIDIA Data Center GPU Driver, CUDA Toolkit, NVIDIA Container Toolkit and other NVIDIA software directly from NVIDIA repository on NVIDIA GPU EC2 instances running Ubuntu on AWS.
Note that by using this method, you agree to NVIDIA Driver License Agreement, End User License Agreement and other related license agreement. If you are doing development, you may want to register for NVIDIA Developer Program.
This article applies to Ubuntu Linux on AWS only. Similar articles are available for AL2, AL2023, RHEL/Rocky Linux/AlmaLinux and Windows.
This article install NVIDIA Tesla driver which does not support G6f instances with fractional GPUs. Refer to this article about NVIDIA GRID driver install.
Other Options
If you need AMIs preconfigured with NVIDIA GPU driver, CUDA, other NVIDIA software, and optionally PyTorch or TensorFlow framework, consider AWS Deep Learning AMIs. Refer to Release notes for DLAMIs for currently supported options, and Deep Learning graphical desktop on Ubuntu Linux with AWS Deep Learning AMI (DLAMI) for graphical desktop setup guidance.
Refer to NVIDIA drivers for your Amazon EC2 instance for NVIDIA driver install options and NVIDIA Driver Installation Guide for Tesla driver installation instructions. You can also install NVIDIA driver from Ubuntu repository.
For container workloads, consider Amazon ECS-optimized Linux AMIs and Amazon EKS optimized AMIs
Note: instructions in this article are not applicable to pre-built AMIs.
About CUDA toolkit
As CUDA driver is part of NVIDIA GPU driver, CUDA Toolkit is generally optional when GPU instance is used to run applications (as opposed to develop applications) as the CUDA application typically packages (by statically or dynamically linking against) the CUDA runtime and libraries needed.
System Requirements
NVIDIA CUDA supports the following platforms
- Ubuntu Linux 24.04 (x86_64 and arm64)
- Ubuntu Linux 22.04 (x86_64 and arm64)
Refer to Driver installation guide for supported kernel versions, compilers and libraries.
Prerequisites
Go to Service Quotas console of your desired Region to verify On-Demand Instance quota value of your desired instance type:
- G instance types: Running On-Demand G and VT instances
- P instance types: Running On-Demand P instances
Request quota increase if the assigned value is less than vCPU count of your desired EC2 instance size. Do not proceed until your applied quota value is equal or higher than your instance type vCPU count
Prepare Ubuntu Linux
Launch a new NVIDIA GPU instance running Ubuntu Linux preferably with at least 20 GB storage and connect to the instance
Update OS, and install DKMS, kernel headers and development packages
sudo apt update sudo apt upgrade -y sudo apt autoremove -y sudo apt install -y dkms linux-headers-aws linux-modules-extra-aws amazon-ec2-utils unzip gcc make libglvnd-dev pkg-config
Restart your EC2 instance if kernel is updated
sudo reboot
Add NVIDIA repository
Configure Network Repo installation
DISTRO=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g') if (arch | grep -q x86); then ARCH=x86_64 else ARCH=sbsa fi cd /tmp curl -L -O https://developer.download.nvidia.com/compute/cuda/repos/$DISTRO/$ARCH/cuda-keyring_1.1-1_all.deb sudo apt install -y ./cuda-keyring_1.1-1_all.deb sudo apt update
If you are installing from AWS China Region, you may be able to replace repository source from https://developer.download.nvidia.com to https://developer.download.nvidia.cn
if (ec2-metadata -z | grep cn-); then sudo sed -i "s/nvidia\.com/nvidia\.cn/g" /etc/apt/sources.list.d/cuda-ubuntu*.list sudo apt clean fi
Install NVIDIA Driver
Option 1: NVIDIA repo driver
To install latest Tesla driver from NVIDIA repository
sudo apt install -y nvidia-open sudo apt install -y nvidia-xconfig
To install a specific driver branch before R590, e.g. R580 LTSB
sudo apt install -y nvidia-open-580 sudo apt install -y nvidia-xconfig
NVIDIA has removed branch designation from the package name starting from R590. Refer to Version Locking if you want to pin NVIDIA driver branch.
The above install open-source GPU kernel module which is recommended by NVIDIA (and is different from Nouveau open-source driver). Refer to Driver Installation Guide about NVIDIA Kernel Modules and installation options.
Option 2: Ubuntu repo driver
Alternatively, pre-compiled NVIDIA modules may be available from Ubuntu repository.
sudo apt update VERSION=$(apt-cache search "nvidia-driver" | grep "^nvidia-driver-.*-server-open" | cut -d"-" -f3 | sort -r | head -1) sudo apt install -y linux-modules-nvidia-$VERSION-server-open-aws nvidia-headless-no-dkms-$VERSION-server-open nvidia-driver-$VERSION-server-open nvidia-utils-$VERSION-server sudo apt install -y nvidia-settings
P instance
If you are using a P instance with multiple GPUs, you may need to install Fabric Manager. Refer to UFM (Unified Fabric Manager) section below for details.
Optional: Compute-only and Desktop Installation
NVIDIA repo supports custom installation method which supports the following configurations:
- Desktop: Contains all the X/Wayland drivers and libraries to allow running a GPU with power management enabled on a desktop system but does not include any CUDA component
- Compute-only / headless: Contains everything required to run CUDA applications on a GPU system where the GPU is not used to drive a display
- Desktop and Compute: canonical way of installing the driver, with every possible library and display component. This might be required in cross functional combinations, for CUDA-accelerated video encoding/decoding.
To install for the above cases:
- Desktop only:
sudo apt install -y libnvidia-gl nvidia-dkms-open - Compute-only/headless:
sudo apt install -y libnvidia-compute nvidia-dkms-open - Desktop and Compute:
sudo apt install -y nvidia-open
Refer to NVIDIA Driver Installation Guide for more information.
Verify
Restart your instance
nvidia-smi
Output should be similar to below
Mon Dec 22 00:24:37 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01 Driver Version: 590.44.01 CUDA Version: 13.1 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 30C P8 10W / 300W | 0MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Refer to section Verify installation integrity on steps to verify CUDA driver integrity.
Optional: CUDA Toolkit
To install latest CUDA Toolkit
sudo apt install -y cuda-toolkit
To install a specific series, e.g. 12.x
sudo apt install -y cuda-toolkit-12
To install a specific version, e.g. 12.9
sudo apt install -y cuda-toolkit-12-9
Refer to CUDA Toolkit documentation about supported platforms and installation options.
Verify
/usr/local/cuda/bin/nvcc -V
Output should be similar to below
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Nov__7_07:23:37_PM_PST_2025
Cuda compilation tools, release 13.1, V13.1.80
Build cuda_13.1.r13.1/compiler.36836380_0
Post-installation Actions
Refer to NVIDIA CUDA Installation Guide for Linux for post-installation actions before CUDA Toolkit can be used. For example, you may want to modify your PATH and LD_LIBRARY_PATH environment variables to include /usr/local/cuda/bin and /usr/local/cuda/lib64 respectively
sed -i '$aexport PATH=$PATH:/usr/local/cuda/bin' ~/.bashrc sed -i '$aexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64' ~/.bashrc . ~/.bashrc
Optional: NVIDIA Container Toolkit
NVIDIA Container toolkit supports Ubuntu on both x86_64 and arm64. For arm64, use g5g.2xlarge or larger instance size as g5g.xlarge may cause failures due to the limited system memory.
To install latest NVIDIA Container Toolkit
sudo apt install -y nvidia-container-toolkit
Refer to NVIDIA Container toolkit documentation about supported platforms, prerequisites and installation options
Verify
nvidia-container-cli -V
Output should be similar to below
cli-version: 1.18.1
lib-version: 1.18.1
build date: 2025-11-24T14:45+00:00
build revision: 889a3bb5408c195ed7897ba2cb8341c7d249672f
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
Container engine configuration
Refer to NVIDIA Container Toolkit documentation about container engine configuration.
Install and configure Docker
To install and configure docker
sudo apt install -y docker.io sudo usermod -aG docker ubuntu sudo systemctl enable docker sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker
Verify Docker engine configuration
To verify docker configuration
sudo docker run --rm --runtime=nvidia --gpus all public.ecr.aws/ubuntu/ubuntu:latest nvidia-smi
Output should be similar to below
Unable to find image 'public.ecr.aws/ubuntu/ubuntu:latest' locally
latest: Pulling from ubuntu/ubuntu
16c195d4c5e9: Pull complete
Digest: sha256:70c941f44c475633b5968e549eea587e78de9d60166408c4ffcd87a3e30ec713
Status: Downloaded newer image for public.ecr.aws/ubuntu/ubuntu:latest
Mon Dec 22 00:25:41 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01 Driver Version: 590.44.01 CUDA Version: 13.1 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 30C P8 11W / 300W | 0MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
EC2 Install Script
You can use the below as install script (or user data) to install GPU driver and NVIDIA Container Toolkit on a new Ubuntu NVIDIA GPU instance preferably with latest patches applied and at least 20 GB storage.
Remove # character (except the first line) if you wish to install CUDA toolkit
#!/bin/bash export DEBIAN_FRONTEND=noninteractive sudo apt update sudo apt upgrade -y sudo apt autoremove -y sudo apt install -y dkms linux-headers-aws linux-modules-extra-aws amazon-ec2-utils unzip gcc make libglvnd-dev pkg-config DISTRO=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g') if (arch | grep -q x86); then ARCH=x86_64 else ARCH=sbsa fi cd /tmp curl -L -O https://developer.download.nvidia.com/compute/cuda/repos/$DISTRO/$ARCH/cuda-keyring_1.1-1_all.deb sudo apt install -y ./cuda-keyring_1.1-1_all.deb sudo apt update sudo apt install -y nvidia-open sudo apt install -y nvidia-xconfig USER=ubuntu # sudo apt install -y cuda-toolkit # sed -i '$aexport PATH=$PATH:/usr/local/cuda/bin' /home/$USER/.bashrc # sed -i '$aexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64' /home/$USER/.bashrc sudo apt install -y docker.io sudo usermod -aG docker $USER sudo systemctl enable docker sudo apt install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker if ( ec2-metadata -t | grep -q " p[0-9]" ); then sudo apt install -y nvidia-fabricmanager libnvidia-nscq libnvsdm nvidia-imex if ( ec2-metadata -t | grep -q " p[6-9]" ); then sudo apt install -y nvlsm infiniband-diags echo "ib_umad" | sudo tee -a /etc/modules-load.d/modules.conf sudo modprobe ib_umad fi sudo systemctl enable --now nvidia-fabricmanager fi sudo reboot
Verify
Connect to your EC2 instance.
nvidia-smi nvidia-container-cli -V sudo docker run --rm --runtime=nvidia --gpus all public.ecr.aws/ubuntu/ubuntu:latest nvidia-smi
If used as user data, view /var/log/cloud-init-output.log to troubleshoot any installation issues.
Perform post-installation actions in order to use CUDA toolkit (if installed).
Verify installation integrity
NVIDIA driver and NVIDIA Container Toolkit
To verify integrity of installation, you can use CUDA samples container image to validate CUDA driver.
sudo docker run --rm --runtime=nvidia --gpus all nvcr.io/nvidia/k8s/cuda-sample:devicequery
Ensure you get Result = PASS output.
NVIDIA driver and CUDA Toolkit
If CUDA toolkit is installed, you can download, compile and run CUDA samples such as deviceQuery.
If you are using a P instance with multiple GPUs, you may need to install Fabric Manager. Refer to UFM (Unified Fabric Manager) section below for instructions
GUI (graphical desktop) remote access
If you need remote graphical desktop access, refer to Install GUI (graphical desktop) on Amazon EC2 instances running Ubuntu Linux
This article installs NVIDIA Tesla driver (also know as NVIDIA Datacenter Driver), which is intended primarily for GPU compute workloads. If configured in xorg.conf, Tesla drivers support one display of up to 2560x1600 resolution.
GRID drivers provide access to four 4K displays per GPU and are certified to provide optimal performance for professional visualization applications. Refer to GPU-accelerated graphical desktop on Ubuntu Linux with NVIDIA GRID and Amazon DCV for setup guidance.
Other software
AWS CLI
To install AWS CLI (AWS Command Line Interface) v2 through Snap
sudo snap install aws-cli --classic
Verify
aws --version
Output should be similar to below
aws-cli/2.27.53 Python/3.13.4 Linux/6.8.0-1029-aws exe/x86_64.ubuntu.24
cuDNN (CUDA Deep Neural Network library)
To install cuDNN for the latest available CUDA version.
sudo apt install -y zlib1g cudnn
Refer to cuDNN documentation about installation options and support matrix
NCCL (NVIDIA Collective Communication Library)
To install latest NCCL
sudo apt install -y libnccl2 libnccl-dev
Refer to NCCL documentation about installation options
DCGM (Data Center GPU Manager)
To install DCGM
CUDA_VERSION=$(nvidia-smi | sed -E -n 's/.*CUDA Version: ([0-9]+)[.].*/\1/p') sudo apt-get install --yes \ --install-recommends \ datacenter-gpu-manager-4-cuda${CUDA_VERSION}
Refer to DCGM documentation for more information
Verify
dcgmi --version
Output should be similar to below
dcgmi version: 4.4.2
GDS (GPUDirect Storage)
To install NVIDIA Magnum IO GPUDirect® Storage (GDS)
sudo apt install -y nvidia-gds
To install for a specific CUDA version, e.g. 13.0
sudo apt install -y nvidia-gds-13-0
Reboot
Restart to load kernel module
sudo reboot
Verify
To verify module
lsmod | grep nvidia_fs
Output should be similar to below
nvidia_fs 262144 0
nvidia 11481088 3 nvidia_uvm,nvidia_fs,nvidia_modeset
To verify successful installation
sudo /usr/local/cuda/gds/tools/gdscheck -p
Output should be similar to below
GDS release version: 1.16.0.49
nvidia_fs minimum version: 2.12
Platform: x86_64
...
...
=========
GPU INFO:
=========
GPU index 0 NVIDIA A10G bar:1 bar size (MiB):32768 supports GDS, IOMMU State: Disabled
==============
PLATFORM INFO:
==============
IOMMU: disabled
Nvidia Driver Info Status: Supported(Nvidia Open Driver Installed)
Cuda Driver Version Installed: 13010
Platform: g5.xlarge, Arch: x86_64(Linux 6.14.0-1018-aws)
Platform verification succeeded
Refer to GDS documentation and Driver installation guide for more information
GDRCopy
Magnum IO GDRCopy packages for different CUDA versions can be installed from NVIDIA Developer download site. Alternatively, download and compile from Github
Restart your EC2 instance
sudo reboot
Verify
lsmod | grep gdr
Output should be similar to below
gdrdrv 28672 0
nvidia 14376960 7 nvidia_uvm,gdrdrv,nvidia_modeset
CUDA-X Libraries
NVIDIA repository also provides access to CUDA Math, Quantum and other libraries such as cuTENSOR, cuFFT and cuQuantum. Refer to NVIDIA site for more information
UFM (Unified Fabric Manager)
Eligibility
To determine if you need NVIDIA Unified Fabric Manager (UFM)
nvidia-smi -q -i 0 | grep Fabric -A2 | grep State
If State is N/A, you do not need Fabric Manager
State : N/A
If State is not N/A, install Fabric Manager as per next section
State : In Progress
Install
To install latest NVIDIA Unified Fabric Manager (UFM), NSCQ, NVSDM, IMEX for EC2 instances with NVIDIA NVLink.
sudo apt install -y nvidia-fabricmanager libnvidia-nscq libnvsdm nvidia-imex sudo systemctl enable --now nvidia-fabricmanager
P6 instance
P6 instance requires NVLink Subnet Manager (NVLSM)
sudo apt install -y nvlsm infiniband-diags echo "ib_umad" | sudo tee -a /etc/modules-load.d/modules.conf sudo modprobe ib_umad sudo systemctl restart nvidia-fabricmanager
Refer to EC2 and NVIDIA documentation for up to date instructions.
Verify
nv-fabricmanager -v systemctl status nvidia-fabricmanager
Output should be similar to below
Fabric Manager version is : 590.44.01
● nvidia-fabricmanager.service - NVIDIA fabric manager service
Loaded: loaded (/usr/lib/systemd/system/nvidia-fabricmanager.service; enabled; preset: enabled)
Active: active (running) since ......... UTC; 1min 4s ago
Process: 22851 ExecStart=/usr/bin/nvidia-fabricmanager-start.sh --mode start (code=exited, status=0/SUCCESS)
Main PID: 22881 (nv-fabricmanage)
Tasks: 18 (limit: 3355442)
Memory: 38.1M
CPU: 633ms
CGroup: /system.slice/nvidia-fabricmanager.service
└─22881 /usr/bin/nv-fabricmanager -c /usr/share/nvidia/nvswitch/fabricmanager.cfg
.........compute.internal nv-fabricmanager[22881]: Starting nvidia-fabricmanager.service - NVIDIA fabric manager service...
.........compute.internal nv-fabricmanager[22881]: Detected Pre-NVL5 system
.........compute.internal nv-fabricmanager[22881]: Connected to 1 node.
.........compute.internal nv-fabricmanager[22881]: Successfully configured all the available NVSwitches to route GPU NVLink traffic. NVLink Peer-to-Peer support will be enabled once the GPUs are successfully registered with the NVLink fabric.
.........compute.internal nv-fabricmanager[22881]: Started "Nvidia Fabric Manager"
.........compute.internal nv-fabricmanager[22881]: Started nvidia-fabricmanager.service - NVIDIA fabric manager service.
To view GPU fabric registration status
nvidia-smi -q -i 0 | grep -i -A 2 Fabric
Output should be similar to below after the GPUs have successfully registered
Fabric
State : Completed
Status : Success
Refer to Fabric Manager documentation for more information.
- Language
- English
Relevant content
- asked 3 years ago
