HyperparameterTuner: give unique checkpoint_s3_uri to spawned TrainingJobs

1

I am implementing a sagemaker tuner for a previously implemented custom estimator that uses spot instances with sagemaker's python sdk. Since until now there is one estimator per submitted training job, I could easily generate a unique s3 prefix for each training job.

Since the tuner takes a pre-initialized estimator object, it seems to me I cannot assign a unique S3 prefix to the spawned training jobs anymore, and therefore every training job would have to download all previously exported checkpoints before it can start.

While I could hack some cleanup code into the training script, and then only have to download checkpoints of currently active training jobs, it seems to me that there must be a better way I am not seeing.

xref: EstimatorBase, HyperparameterTuner.

Chris
已提問 2 年前檢視次數 70 次
沒有答案

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南