スキップしてコンテンツを表示

training nn is slower then usual almost 2.5 slower

0

i am using 4 gpu g5.12xlarge instance to train object detection models. each iteration takes 1 sec. this week when i started new training in 3 different instances. training is taking more time. 2.5 sec for iteration. i didnt do any change in the model. added some small amount of data. but nothing else. i checked my montior and i can see that some monitors are lower then last week.

cpu utilization. network in , network out , network packets.

thank u for helping

質問済み 2年前184ビュー
1回答
0

The increased iteration time for your object detection model training could be due to resource contention or changes in the instance's performance. Although you haven't altered the model and only added a small amount of data, variations in CPU utilization, network performance, or overall instance load could be causing slower training. To diagnose the issue, check if GPU utilization is optimal, investigate potential network bottlenecks, and ensure that data loading and preprocessing aren't introducing delays. Also, verify that the instances are performing as expected and consider testing on different instances to rule out hardware variability.

回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

関連するコンテンツ