- Newest
- Most votes
- Most comments
Yes cloudwatch metrics are available for the spot fleet.
You can configure this by following the link below and choosing necessary metrics to use.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet-cloudwatch-metrics.html
Spot fleet capacity rebalance: (Scale in/out) https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet-capacity-rebalance.html
Handling spot fleet interruptions: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html
Custom Termination policies are only available on AutoScaling Groups (ASGs), not for SpotFleet, so you would need to migrate to use an ASG for them, so for your use case an ASG may work better. In spot fleet, when an instance needs to be terminated it will select the instance based on the allocation strategy
On an ASG you could also use scale-in protection or terminating lifecycle hooks as options then as well. The best option would depend on factors such as how long the renders take, and how often new jobs come in.
Here's a few high level examples of when each feature might work best and how to use them
-
Custom Termination policy[1] You have a way for the Lambda function to determine which instance is idle, such as an API/HTTP request that could be sent to it
-
Scale-In Protection[2] If each instance only runs a single job, or runs very long jobs then this might work best. Have the setting enabled on the ASG itself so that all new instances have protection enabled, and then once a render finishes check if there's anymore in the queue; if there isn't, then disable protection to allow the instance to be scaled in
-
Terminating lifecycle hook[3] This would work best for short jobs (max of 2 hours, since that's the max heartbeat timeout for a lifecycle hook). Put a terminating lifecycle hook on the ASG, and in the application whenever it finishes a job, have it check to see if it is currently in the Terminating:Wait state on the ASG. If it is, complete the lifecycle hook to finish the termination process, and if not grab a new job
[1] https://docs.aws.amazon.com/autoscaling/ec2/userguide/lambda-custom-termination-policy.html
[2] https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-instance-protection.html
[3] https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html
Relevant content
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago