HyperparameterTuner: give unique checkpoint_s3_uri to spawned TrainingJobs

1

I am implementing a sagemaker tuner for a previously implemented custom estimator that uses spot instances with sagemaker's python sdk. Since until now there is one estimator per submitted training job, I could easily generate a unique s3 prefix for each training job.

Since the tuner takes a pre-initialized estimator object, it seems to me I cannot assign a unique S3 prefix to the spawned training jobs anymore, and therefore every training job would have to download all previously exported checkpoints before it can start.

While I could hack some cleanup code into the training script, and then only have to download checkpoints of currently active training jobs, it seems to me that there must be a better way I am not seeing.

xref: EstimatorBase, HyperparameterTuner.

Nessuna risposta

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande