HyperparameterTuner: give unique checkpoint_s3_uri to spawned TrainingJobs


I am implementing a sagemaker tuner for a previously implemented custom estimator that uses spot instances with sagemaker's python sdk. Since until now there is one estimator per submitted training job, I could easily generate a unique s3 prefix for each training job.

Since the tuner takes a pre-initialized estimator object, it seems to me I cannot assign a unique S3 prefix to the spawned training jobs anymore, and therefore every training job would have to download all previously exported checkpoints before it can start.

While I could hack some cleanup code into the training script, and then only have to download checkpoints of currently active training jobs, it seems to me that there must be a better way I am not seeing.

xref: EstimatorBase, HyperparameterTuner.

gefragt vor 2 Jahren70 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen