Skip to content

training nn is slower then usual almost 2.5 slower

0

i am using 4 gpu g5.12xlarge instance to train object detection models. each iteration takes 1 sec. this week when i started new training in 3 different instances. training is taking more time. 2.5 sec for iteration. i didnt do any change in the model. added some small amount of data. but nothing else. i checked my montior and i can see that some monitors are lower then last week.

cpu utilization. network in , network out , network packets.

thank u for helping

asked 2 years ago178 views
1 Answer
0

The increased iteration time for your object detection model training could be due to resource contention or changes in the instance's performance. Although you haven't altered the model and only added a small amount of data, variations in CPU utilization, network performance, or overall instance load could be causing slower training. To diagnose the issue, check if GPU utilization is optimal, investigate potential network bottlenecks, and ensure that data loading and preprocessing aren't introducing delays. Also, verify that the instances are performing as expected and consider testing on different instances to rule out hardware variability.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.