- Newest
- Most votes
- Most comments
Yes, you can use spot instances. I recommend it, and always run training on spot instances. If you are using the Python SDK, add the following parameters to your Estimator:
use_spot_instances=True,
max_run={maximum runtime here},
max_wait={maximum wait time},
checkpoint_s3_uri={URI of your bucket and folder },
See the documentation for more details here: https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html
As far as instance types are concerned, the individual algorithms contain some initial recommendations for instances types: https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html
For example, see the EC2 Instance Recommendation for the Image Classification Algorithm: https://docs.aws.amazon.com/sagemaker/latest/dg/image-classification.html
There was a presentation at re:Invent 2020 - How to choose the right instance type for ML inference: https://www.youtube.com/watch?v=0DSgXTN7ehg
Hope this helps
And for the selection of instance type for inference, you might want to look at Amazon SageMaker Inference Recommender:
https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender.html
Relevant content
- Accepted Answerasked 2 years ago
- asked 6 months ago
- asked a year ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 2 years ago
@Dennis_A -thanks . I see the instances types with gpu are either 1 and only other option is with 8 gpu . Nothing in between
SageMaker also has an Inference Recommender that helps you select the best instance type and configuration for hosting - https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender.html