No Answers
- Newest
- Most votes
- Most comments
Relevant content
- asked 2 years ago
- Accepted Answerasked a year ago
- Accepted Answerasked 2 years ago
- Accepted Answer
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
You can run distributed training with spot, just specify use_spot=True. However, add periodic checkpoints (about every hour or so) if you're using spot instances - https://docs.aws.amazon.com/sagemaker/latest/dg/model-checkpoints.html