Upgrade nvidia-driver in Amazon EKS AMI with nvidia gpu support

1

The current EKS optimized Amazon Linux AMI ships nvidia-driver version 470. Unfortunately, our software requires version 510. Is there an official AMI with such version, or how could I upgrade the nvidia-driver in the AMI or perhaps using an overridden bootstrap command?

已提問 2 年前檢視次數 377 次
2 個答案
0

You can automate the install of the nvidia drivers via User Data in the launch template for your clusters. Here's the docs for managed node group launch templates that might help, https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html

AWS
Ray K
已回答 2 年前
  • I don't think it's an efficient method to install something at the boot of the worker node. Anyway, it might be done as a last resort, however, unfortunately the Amazon repo for nvidia packages (which is used for the gpu supported AMI) doesn't have any newer nvidia and cuda related packages. Could it be updated to have nvidia-510 packages as well? If so where to file such request?

0

We are also encountered this issue. Is there a more recent solution? This is a breaking issue with torch 2.

It seems like the recommended approach here is to create a new custom AMI. Deep Learning AMI GPU PyTorch 1.11.0 (Ubuntu 20.04) 20220912 does have 5xx drivers (but my understanding is it has no K8s support), while our EKS AMI has the old drivers. Perhaps we will be able to get a new AMI working properly, but this seems like something that AWS should offer.

Samson
已回答 10 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南