Spot Fleet down scaling instance selection
What is the method used to select candidate instances when manually scaling down a spot fleet? I'm managing a render fleet and would like to terminate only idle instances when scaling in the fleet. Is there the ability to choose based on Cloud Watch metrics, or the ability to insert a custom termination policy like the Auto Scaling Group has?
Yes cloudwatch metrics are available for the spot fleet.
You can configure this by following the link below and choosing necessary metrics to use.
Spot fleet capacity rebalance: (Scale in/out) https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet-capacity-rebalance.html
Handling spot fleet interruptions: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html
Custom Termination policies are only available on AutoScaling Groups (ASGs), not for SpotFleet, so you would need to migrate to use an ASG for them, so for your use case an ASG may work better. In spot fleet, when an instance needs to be terminated it will select the instance based on the allocation strategy
On an ASG you could also use scale-in protection or terminating lifecycle hooks as options then as well. The best option would depend on factors such as how long the renders take, and how often new jobs come in.
Here's a few high level examples of when each feature might work best and how to use them
- Custom Termination policy
You have a way for the Lambda function to determine which instance is idle, such as an API/HTTP request that could be sent to it
- Scale-In Protection
If each instance only runs a single job, or runs very long jobs then this might work best. Have the setting enabled on the ASG itself so that all new instances have protection enabled, and then once a render finishes check if there's anymore in the queue; if there isn't, then disable protection to allow the instance to be scaled in
- Terminating lifecycle hook
This would work best for short jobs (max of 2 hours, since that's the max heartbeat timeout for a lifecycle hook). Put a terminating lifecycle hook on the ASG, and in the application whenever it finishes a job, have it check to see if it is currently in the Terminating:Wait state on the ASG. If it is, complete the lifecycle hook to finish the termination process, and if not grab a new job
Always-on Fleet Costingasked 3 years ago
Can we use spot fleet in cluster config section?asked 10 months ago
Spot fleet request over-provisionedAccepted Answerasked 2 months ago
Updating AMI of a EC2 Fleetasked 3 years ago
Spot instances for inference and sagemaker?asked 3 months ago
Spot Fleet down scaling instance selectionasked 7 months ago
Create Spot Fleet Request / Chargesasked a month ago
ECS + Spot Integration - Multiple ASGs vs SpotFleetAccepted Answerasked 4 years ago
Spot Fleet Instance could not be stoppedasked 22 days ago
Spot persistent requestasked 25 days ago