Machine Learning image training on EC2 with GPU

I am training deep learning models with ten thousand images on a G4 GPU instance, using local storage. Using parallel PyTorch dataloaders, just like I do with on-prem GPU hardware. On-prem, GPU utilization is typically a constant 99% during training and varies during validation steps. On EC2, training flips between 30/maybe up to 70% util and back to zero, for an average of maybe 30-40%. Please suggest how to get more GPU utilization in this scenario.

MichaelFischer エキスパート
2年前
Just to be clear, by "local storage," do you mean EC2 instance storage, or do you mean the root EBS volume for your instance? The two have very different performance characteristics.

トピック

計算する機械学習と AI

タグ

Amazon EC2 機械学習と AI

言語

English

AWS-User-0426098

質問済み 2年前314ビュー

1回答

新しい順
投票が多い順
コメントが多い順

これらの回答は役に立ちましたか？コミュニティがあなたの知識を活用できるように、正解に賛成票を投じてください。

Hello,

Thank you for posting your question! You may consider below steps to optimize the GPU setting to get the best performance from the GPU: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/optimize_gpu.html

In the above URL you can specify GPU clock speed to maximum frequency depending on instance type.

サポートエンジニア

AWS-User-8128162

回答済み 2年前

関連するコンテンツ

AMC Insights on AWSのデプロイに失敗します
承認された回答
Sowmay
質問済み 9ヶ月前
Zabbix on EC2からのテストメール送信が出来ない件について
kenichis
質問済み 2ヶ月前
AWS Training and Certificationの認定ページに入れません。
tanaka
質問済み 1年前
on-demand mode のKinesis StreamのShardスケールインのタイミング
s_hiruta
質問済み 3ヶ月前
NVIDIA GPU でアクセラレートされた EC2 Linux インスタンスで Xid エラーをトラブルシューティングする方法を教えてください？
AWS公式更新しました 10ヶ月前
CloudWatchがAmazon SageMaker エンドポイントの CPU または GPU 使用率が 100% を超えていることが示されるのはなぜですか?
AWS公式更新しました 2年前
FSx for ONTAP ファイルシステム上のボリュームの i ノードまたはファイルの数を増やすにはどのような方法がありますか?
AWS公式更新しました 1年前
独自のカスタムコンテナをトレーニングや推論のために Amazon SageMaker で使用する際の問題をトラブルシューティングするにはどうすればよいですか?
AWS公式更新しました 2年前
AWS Application Migration Service (MGN) とエージェントレス vCenter クライアントを利用して VMware 仮想環境から AWS への移行を加速させる
エキスパート
Koichi Takeda
公開済み 5日前
アベイラビリティーゾーン (AZ) の移行&インスタンスのアップグレードガイド
エキスパート
Sumikawa_M
公開済み 2ヶ月前