Sagemaker training for multiclass classification run does not store the trained model

0

Hi,

I have trained a multiclass classification model using auto-ml.

Used

  • Training image: sagemaker-xgboost:1.3-1-cpu-py
  • Instance type: ml.m5.12xlarge

The run succeeded to complete 2 cross validation folds before the time limit was reached. The resulting best model was not stored in the specified s3 location. The job is configured to store the model on termination.

In parallel I have trained other classifications with the same auto-ml template (jupyter NB) successfully, so I don't think it is a configuration or permission issue.

The main difference for this classification training is the higher number of labels, which is 1950. The allowed label limit for this algorithm is 2000.

I also repeated the run for this model candidate 2 times with the same result: that the model was not stored.

CloudWatch has no entries regarding problems to create or store the model.

Thanks, Arthur

arthur
demandé il y a 2 ans387 vues
1 réponse
0
Réponse acceptée

I solved the problem on my own:

  • reduced the number of folds to reduce the time the algorithm needs to finish. ( set hyperparameter _kfold: 2 )

  • Another possibility would be to increase the time the algorithm is allowed to run to let the algorithm finish.

After giving the algorithm enough time to finish, it completed and also stored the model in s3.

So the problem was to store the model on termination: I suppose the default time of 120 seconds was not enough.

arthur
répondu il y a 2 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions