2 réponses
- Le plus récent
- Le plus de votes
- La plupart des commentaires
0
You can automate the install of the nvidia drivers via User Data in the launch template for your clusters. Here's the docs for managed node group launch templates that might help, https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html
répondu il y a 2 ans
0
We are also encountered this issue. Is there a more recent solution? This is a breaking issue with torch 2.
It seems like the recommended approach here is to create a new custom AMI. Deep Learning AMI GPU PyTorch 1.11.0 (Ubuntu 20.04) 20220912 does have 5xx drivers (but my understanding is it has no K8s support), while our EKS AMI has the old drivers. Perhaps we will be able to get a new AMI working properly, but this seems like something that AWS should offer.
répondu il y a 10 mois
Contenus pertinents
- demandé il y a un an
- demandé il y a un an
- demandé il y a un an
- Réponse acceptéedemandé il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans
- AWS OFFICIELA mis à jour il y a 10 mois
- AWS OFFICIELA mis à jour il y a 4 mois
- AWS OFFICIELA mis à jour il y a un an
I don't think it's an efficient method to install something at the boot of the worker node. Anyway, it might be done as a last resort, however, unfortunately the Amazon repo for nvidia packages (which is used for the gpu supported AMI) doesn't have any newer nvidia and cuda related packages. Could it be updated to have nvidia-510 packages as well? If so where to file such request?