HyperparameterTuner: give unique checkpoint_s3_uri to spawned TrainingJobs

1

I am implementing a sagemaker tuner for a previously implemented custom estimator that uses spot instances with sagemaker's python sdk. Since until now there is one estimator per submitted training job, I could easily generate a unique s3 prefix for each training job.

Since the tuner takes a pre-initialized estimator object, it seems to me I cannot assign a unique S3 prefix to the spawned training jobs anymore, and therefore every training job would have to download all previously exported checkpoints before it can start.

While I could hack some cleanup code into the training script, and then only have to download checkpoints of currently active training jobs, it seems to me that there must be a better way I am not seeing.

xref: EstimatorBase, HyperparameterTuner.

Chris
質問済み 2年前70ビュー
回答なし

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ