Spot Fleet down scaling instance selection

0

What is the method used to select candidate instances when manually scaling down a spot fleet? I'm managing a render fleet and would like to terminate only idle instances when scaling in the fleet. Is there the ability to choose based on Cloud Watch metrics, or the ability to insert a custom termination policy like the Auto Scaling Group has?

질문됨 2년 전387회 조회
2개 답변
1

Yes cloudwatch metrics are available for the spot fleet.

You can configure this by following the link below and choosing necessary metrics to use.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet-cloudwatch-metrics.html

Spot fleet capacity rebalance: (Scale in/out) https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet-capacity-rebalance.html

Handling spot fleet interruptions: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html

AWS
Dhilip
답변함 2년 전
0

Custom Termination policies are only available on AutoScaling Groups (ASGs), not for SpotFleet, so you would need to migrate to use an ASG for them, so for your use case an ASG may work better. In spot fleet, when an instance needs to be terminated it will select the instance based on the allocation strategy

On an ASG you could also use scale-in protection or terminating lifecycle hooks as options then as well. The best option would depend on factors such as how long the renders take, and how often new jobs come in.

Here's a few high level examples of when each feature might work best and how to use them

  1. Custom Termination policy[1] You have a way for the Lambda function to determine which instance is idle, such as an API/HTTP request that could be sent to it

  2. Scale-In Protection[2] If each instance only runs a single job, or runs very long jobs then this might work best. Have the setting enabled on the ASG itself so that all new instances have protection enabled, and then once a render finishes check if there's anymore in the queue; if there isn't, then disable protection to allow the instance to be scaled in

  3. Terminating lifecycle hook[3] This would work best for short jobs (max of 2 hours, since that's the max heartbeat timeout for a lifecycle hook). Put a terminating lifecycle hook on the ASG, and in the application whenever it finishes a job, have it check to see if it is currently in the Terminating:Wait state on the ASG. If it is, complete the lifecycle hook to finish the termination process, and if not grab a new job

[1] https://docs.aws.amazon.com/autoscaling/ec2/userguide/lambda-custom-termination-policy.html

[2] https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-instance-protection.html

[3] https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html

AWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인