Sagemaker training for multiclass classification run does not store the trained model

0

Hi,

I have trained a multiclass classification model using auto-ml.

Used

  • Training image: sagemaker-xgboost:1.3-1-cpu-py
  • Instance type: ml.m5.12xlarge

The run succeeded to complete 2 cross validation folds before the time limit was reached. The resulting best model was not stored in the specified s3 location. The job is configured to store the model on termination.

In parallel I have trained other classifications with the same auto-ml template (jupyter NB) successfully, so I don't think it is a configuration or permission issue.

The main difference for this classification training is the higher number of labels, which is 1950. The allowed label limit for this algorithm is 2000.

I also repeated the run for this model candidate 2 times with the same result: that the model was not stored.

CloudWatch has no entries regarding problems to create or store the model.

Thanks, Arthur

arthur
질문됨 2년 전387회 조회
1개 답변
0
수락된 답변

I solved the problem on my own:

  • reduced the number of folds to reduce the time the algorithm needs to finish. ( set hyperparameter _kfold: 2 )

  • Another possibility would be to increase the time the algorithm is allowed to run to let the algorithm finish.

After giving the algorithm enough time to finish, it completed and also stored the model in s3.

So the problem was to store the model on termination: I suppose the default time of 120 seconds was not enough.

arthur
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠