Sagemaker training for multiclass classification run does not store the trained model

0

Hi,

I have trained a multiclass classification model using auto-ml.

Used

  • Training image: sagemaker-xgboost:1.3-1-cpu-py
  • Instance type: ml.m5.12xlarge

The run succeeded to complete 2 cross validation folds before the time limit was reached. The resulting best model was not stored in the specified s3 location. The job is configured to store the model on termination.

In parallel I have trained other classifications with the same auto-ml template (jupyter NB) successfully, so I don't think it is a configuration or permission issue.

The main difference for this classification training is the higher number of labels, which is 1950. The allowed label limit for this algorithm is 2000.

I also repeated the run for this model candidate 2 times with the same result: that the model was not stored.

CloudWatch has no entries regarding problems to create or store the model.

Thanks, Arthur

arthur
已提问 2 年前387 查看次数
1 回答
0
已接受的回答

I solved the problem on my own:

  • reduced the number of folds to reduce the time the algorithm needs to finish. ( set hyperparameter _kfold: 2 )

  • Another possibility would be to increase the time the algorithm is allowed to run to let the algorithm finish.

After giving the algorithm enough time to finish, it completed and also stored the model in s3.

So the problem was to store the model on termination: I suppose the default time of 120 seconds was not enough.

arthur
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则