1 réponse
- Le plus récent
- Le plus de votes
- La plupart des commentaires
0
Hi olivier, If you enable Sagemaker checkpointing , it periodically saves a copy of the artifacts into S3. I have used this in pytorch and it works by checkpointing periodically and the blog on Managed Spot Training: Save Up to 90% On Your Amazon SageMaker Training Jobs also mentions the same
To avoid restarting a training job from scratch should it be interrupted, we strongly recommend that you implement checkpointing, a technique that saves the model in training at periodic intervals
Contenus pertinents
- demandé il y a un an
- demandé il y a 6 mois
- demandé il y a un an
- demandé il y a 4 mois
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a 10 mois
- AWS OFFICIELA mis à jour il y a 2 ans