Machine Learning image training on EC2 with GPU

0

I am training deep learning models with ten thousand images on a G4 GPU instance, using local storage. Using parallel PyTorch dataloaders, just like I do with on-prem GPU hardware. On-prem, GPU utilization is typically a constant 99% during training and varies during validation steps. On EC2, training flips between 30/maybe up to 70% util and back to zero, for an average of maybe 30-40%. Please suggest how to get more GPU utilization in this scenario.

  • Just to be clear, by "local storage," do you mean EC2 instance storage, or do you mean the root EBS volume for your instance? The two have very different performance characteristics.

gefragt vor 2 Jahren314 Aufrufe
1 Antwort
0

Hello,

Thank you for posting your question! You may consider below steps to optimize the GPU setting to get the best performance from the GPU: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/optimize_gpu.html

In the above URL you can specify GPU clock speed to maximum frequency depending on instance type.

AWS
SUPPORT-TECHNIKER
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen