HyperparameterTuner: give unique checkpoint_s3_uri to spawned TrainingJobs

1

I am implementing a sagemaker tuner for a previously implemented custom estimator that uses spot instances with sagemaker's python sdk. Since until now there is one estimator per submitted training job, I could easily generate a unique s3 prefix for each training job.

Since the tuner takes a pre-initialized estimator object, it seems to me I cannot assign a unique S3 prefix to the spawned training jobs anymore, and therefore every training job would have to download all previously exported checkpoints before it can start.

While I could hack some cleanup code into the training script, and then only have to download checkpoints of currently active training jobs, it seems to me that there must be a better way I am not seeing.

xref: EstimatorBase, HyperparameterTuner.

Chris
preguntada hace 2 años70 visualizaciones
No hay respuestas

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas