Sagemaker training for multiclass classification run does not store the trained model

0

Hi,

I have trained a multiclass classification model using auto-ml.

Used

  • Training image: sagemaker-xgboost:1.3-1-cpu-py
  • Instance type: ml.m5.12xlarge

The run succeeded to complete 2 cross validation folds before the time limit was reached. The resulting best model was not stored in the specified s3 location. The job is configured to store the model on termination.

In parallel I have trained other classifications with the same auto-ml template (jupyter NB) successfully, so I don't think it is a configuration or permission issue.

The main difference for this classification training is the higher number of labels, which is 1950. The allowed label limit for this algorithm is 2000.

I also repeated the run for this model candidate 2 times with the same result: that the model was not stored.

CloudWatch has no entries regarding problems to create or store the model.

Thanks, Arthur

arthur
asked 2 years ago375 views
1 Answer
0
Accepted Answer

I solved the problem on my own:

  • reduced the number of folds to reduce the time the algorithm needs to finish. ( set hyperparameter _kfold: 2 )

  • Another possibility would be to increase the time the algorithm is allowed to run to let the algorithm finish.

After giving the algorithm enough time to finish, it completed and also stored the model in s3.

So the problem was to store the model on termination: I suppose the default time of 120 seconds was not enough.

arthur
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions