跳至内容

g4dn EC2 refusing to use T4

0

set up the instance, installed the nvidia drivers, installed dcv, on connect it's using some amazon display adapter (aws indirect display device according to dxdiag), and not the nvidia one. Windows Server 2025 Nvidia smi output:

+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 573.07 Driver Version: 573.07 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla T4 TCC | 00000000:00:1E.0 Off | 0 | | N/A 33C P0 24W / 70W | 163MiB / 15360MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+

Seems fine. but how can I set the T4 to the default used one?

已提问 5 个月前156 查看次数
2 回答
2

From your nviidia-smi output, your driver is running in TCC and not WDDM mode. TCC mode does not use GPU for graphics acceleration. For example, your Process List is empty

Do uninstall and install NVIDIA GRID (aka NVIDIA RTX Virtual Workstation) driver as per documentation. Or you can refer to this blog for installation using CloudFormation. Example screenshot of nvidia-smi and nvidia-smi -q | findstr "Product License" below

Enter image description here

AWS
专家
已回答 5 个月前
-1

The issue you're experiencing is that your g4dn instance has the NVIDIA T4 GPU properly installed (as shown by your nvidia-smi output), but it's not being used for display rendering in your DCV session.

This is actually expected behavior. The NVIDIA T4 GPU on g4dn instances is configured in TCC (Tesla Compute Cluster) mode as shown in your nvidia-smi output, which optimizes it for compute workloads rather than graphics display. The AWS indirect display device is handling the display rendering while the T4 is available for compute tasks.

To use the GPU for graphics acceleration in your DCV session, you need to:

  1. Make sure you have the DCV-GL package installed (nice-dcv-gl)
  2. Ensure the NVIDIA driver is properly configured for DCV
  3. Run the following commands to enable GPU acceleration for DCV:
  • Stop the X server
  • Run dcvgladmin disable
  • Run dcvgladmin enable
  • Restart the X server

You can verify if OpenGL acceleration is working by running the dcvgltest tool from the nice-dcv-gltest package.

If you're using a DLAMI (Deep Learning AMI), make sure you're using the correct one for your g4dn instance. AWS provides specific DLAMIs that use the NVIDIA OSS driver which supports G4dn instances.
Sources
Issues gdscheck -p - receiving: CUDA_ERROR_SYSTEM_NOT_READY | AWS re:Post
DCV cannot access the 3D X Server [:0.0 :0.1]. | AWS re:Post
Specifications for Amazon EC2 accelerated computing instances - Amazon EC2
Important NVIDIA driver changes to DLAMIs - AWS Deep Learning AMIs

已回答 5 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。