1 Answer
- Newest
- Most votes
- Most comments
0
【以下的回答经过翻译处理】 Olivier 您好,
如果启用 SageMaker 检查点,它会定期将训练产物的副本保存到 S3 中。我在 pytorch 中使用过这个功能,它通过定期检查点来工作,Managed Spot Training: Save Up to 90% On Your Amazon SageMaker Training Jobs 博客也提到了同样的方法。
为了避免在训练作业中断时需要从头开始,我们强烈建议您启用检查点,定期保存正在训练中的模型。
Relevant content
- Accepted Answerasked 10 months ago
- asked 10 months ago
- asked a year ago
- Accepted Answerasked 2 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 4 months ago