By using AWS re:Post, you agree to the Terms of Use
/Spot Fleet down scaling instance selection/

Spot Fleet down scaling instance selection

0

What is the method used to select candidate instances when manually scaling down a spot fleet? I'm managing a render fleet and would like to terminate only idle instances when scaling in the fleet. Is there the ability to choose based on Cloud Watch metrics, or the ability to insert a custom termination policy like the Auto Scaling Group has?

2 Answers
1

Yes cloudwatch metrics are available for the spot fleet.

You can configure this by following the link below and choosing necessary metrics to use.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet-cloudwatch-metrics.html

Spot fleet capacity rebalance: (Scale in/out) https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet-capacity-rebalance.html

Handling spot fleet interruptions: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html

answered 7 months ago
0

Custom Termination policies are only available on AutoScaling Groups (ASGs), not for SpotFleet, so you would need to migrate to use an ASG for them, so for your use case an ASG may work better. In spot fleet, when an instance needs to be terminated it will select the instance based on the allocation strategy

On an ASG you could also use scale-in protection or terminating lifecycle hooks as options then as well. The best option would depend on factors such as how long the renders take, and how often new jobs come in.

Here's a few high level examples of when each feature might work best and how to use them

  1. Custom Termination policy[1]

You have a way for the Lambda function to determine which instance is idle, such as an API/HTTP request that could be sent to it

  1. Scale-In Protection[2]

If each instance only runs a single job, or runs very long jobs then this might work best. Have the setting enabled on the ASG itself so that all new instances have protection enabled, and then once a render finishes check if there's anymore in the queue; if there isn't, then disable protection to allow the instance to be scaled in

  1. Terminating lifecycle hook[3]

This would work best for short jobs (max of 2 hours, since that's the max heartbeat timeout for a lifecycle hook). Put a terminating lifecycle hook on the ASG, and in the application whenever it finishes a job, have it check to see if it is currently in the Terminating:Wait state on the ASG. If it is, complete the lifecycle hook to finish the termination process, and if not grab a new job

[1] https://docs.aws.amazon.com/autoscaling/ec2/userguide/lambda-custom-termination-policy.html

[2] https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-instance-protection.html

[3] https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html

answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions