- Newest
- Most votes
- Most comments
Your approach to fallback from Spot to On-Demand instances in an Auto Scaling Group (ASG) is on the right track, but there are some important considerations and potential improvements to consider.
Regarding the timing of the INSTANCE_LAUNCH_ERROR event:
-
The ASG does not immediately create an INSTANCE_LAUNCH_ERROR event when it receives a capacity rebalance recommendation or spot termination notification. Instead, it continues attempting to launch new Spot Instances in other enabled Availability Zones.
-
The ASG persists in trying to launch new instances across different Availability Zones until it successfully launches new instances or until the old instances are forcibly terminated. This process can take some time, and there's no immediate error event generated.
-
There isn't a way to configure the ASG to fire the INSTANCE_LAUNCH_ERROR event immediately if it doesn't by default.
Given this behavior, your current approach may not be as reliable or immediate as you're hoping for. Here are some recommendations for a more robust solution:
-
Implement Capacity Rebalancing: This feature proactively attempts to replace Spot Instances that receive a rebalance recommendation before they are interrupted.
-
Use lifecycle hooks: Develop custom actions to gracefully handle instance replacements. This can give you more control over the process.
-
Monitor rebalance recommendations and interruption notices using Amazon EventBridge: This can trigger checkpoints for your workload or handle interruptions more gracefully.
-
Consider using a mixed instances policy in your ASG: This allows you to specify both Spot and On-Demand instances, giving you more flexibility.
-
Implement automated scaling policies: Use CloudWatch alarms to monitor your traffic patterns and set up auto scaling policies to automatically adjust the mix of Spot and On-Demand instances based on availability and demand.
-
Use weighted target group routing with an Application Load Balancer: This can help you gradually shift traffic between Spot and On-Demand instances during scaling operations.
While there isn't a native AWS solution that perfectly fits your requirements, combining these strategies can create a more reliable and responsive system for handling Spot capacity issues and falling back to On-Demand instances when necessary.
Remember to thoroughly test any implementation in a non-production environment before deploying to production, as the behavior can be complex and may vary depending on your specific configuration and workload.
Sources
Sequence of events when an asg receives a capacity rebalance recommendation or spot termination notification | AWS re:Post
Automating EC2 Instance Scaling for Variable Traffic with Zero Downtime | AWS re:Post
Refer to https://medium.com/@ranshn/using-preferred-instance-types-availability-zones-or-purchase-options-with-ec2-auto-scaling-groups-1a5997f3cb0d Your use case matches to point 4 under **"So when would you want to use this approach?"
Relevant content
- asked 3 years ago
- asked 4 years ago
