What's the best way to monitor NVIDIA GPU utilization on Linux (Ubuntu) during model training?

0

Which tool is recommended for monitoring NVIDIA GPU utilization on a Linux (Ubuntu) Amazon EC2 instance? I'm currently training custom TensorFlow ML models and using the NVIDIA System Management Interface (nvidia-smi) to track memory usage, GPU utilization, and the temperature of my NVIDIA GPU devices.

AWS
Ioan
已提问 3 年前776 查看次数
1 回答
0
已接受的回答

You can also use the Amazon SageMaker Debugger Profiling Report to capture system metrics.

The report provides information on the following:

  • System usage statistics
  • Framework metrics
  • Rule evaluation results
  • Step durations
  • GPU utilization
  • Batch size
  • CPU bottlenecks
  • I/O bottlenecks
  • Workload balancing
  • GPU memory
已回答 3 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则