Machine Learning image training on EC2 with GPU

0

I am training deep learning models with ten thousand images on a G4 GPU instance, using local storage. Using parallel PyTorch dataloaders, just like I do with on-prem GPU hardware. On-prem, GPU utilization is typically a constant 99% during training and varies during validation steps. On EC2, training flips between 30/maybe up to 70% util and back to zero, for an average of maybe 30-40%. Please suggest how to get more GPU utilization in this scenario.

  • Just to be clear, by "local storage," do you mean EC2 instance storage, or do you mean the root EBS volume for your instance? The two have very different performance characteristics.

1 Risposta
0

Hello,

Thank you for posting your question! You may consider below steps to optimize the GPU setting to get the best performance from the GPU: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/optimize_gpu.html

In the above URL you can specify GPU clock speed to maximum frequency depending on instance type.

AWS
TECNICO DI SUPPORTO
con risposta 2 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande