By using AWS re:Post, you agree to the Terms of Use
/Spot instances for inference and sagemaker?/

Spot instances for inference and sagemaker?


Is it possible to deploy spot inf1 instances on sagemaker? We run an API 24/7, and it's costly to keep it up, considering we only have 2 hours of peak performance a day.

We don't shut off those machines because we might have random bursts of traffic during the day that CPU instances can't hold. Alternatively, we could deploy spot EC2 inf machines; however, I'm unsure how I would invoke them from gateway and lambda. Does anybody have a tip or recommendation for our case?


1 Answers

You could possibly integrate EC2 Spot instance fleet with Application Auto Scaling service to spin up or down spot instances when you receive traffic. To scale it down to 0 instances, you will need to configure a queue to hold the requests while you spin up from 0 instances to 1 or more. Then your application would insert the requests in the SQS queue and wait for an instance to be available. Take a look at this link for more information on how to configure application autoscaling with Spot instances:

To configure your policy for the autoscaling, you can look at SQS queue length metric. Here is how you can set a target tracking policy for the application autoscaling:

answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions