- Newest
- Most votes
- Most comments
The behavior you're experiencing is known as "thrashing" in the context of Auto Scaling, and it can occur with target tracking policies under certain conditions. In your case, it appears that the Auto Scaling group is rapidly scaling out and then immediately scaling in, causing instability in your service.
This thrashing behavior is likely due to a mismatch between the scaling metric (SQS queue depth divided by number of instances) and the actual workload processing capacity of your instances. When new instances are launched, they may not immediately start processing messages from the queue, leading to a high metric value that triggers further scale-out. Then, as instances begin to process messages, the metric suddenly drops, triggering a scale-in event.
To address this issue, consider the following adjustments:
-
Increase the cooldown period for your Auto Scaling group. This will give new instances more time to start processing messages before another scaling decision is made.
-
Implement instance protection for a short period after launch. This prevents new instances from being terminated immediately after they're launched, giving them time to start processing workload.
-
Adjust your target tracking metric. Instead of using a simple ratio of queue depth to instance count, consider using a more complex metric that accounts for instance warm-up time or message processing rate.
-
Implement step scaling instead of target tracking. This allows you to define more granular scaling actions based on different alarm thresholds, potentially reducing the likelihood of extreme scaling events.
-
Review your application code to ensure that new instances can quickly begin processing messages from the queue. Slow startup times can contribute to scaling instability.
-
Consider using ECS Capacity Providers with managed scaling if you're using Amazon ECS. This can provide more stable scaling behavior for container workloads.
By implementing these changes, you should be able to achieve more stable and predictable scaling behavior, avoiding the thrashing you've experienced. Remember to monitor your Auto Scaling group closely after making changes to ensure the new configuration meets your needs.
Sources
Autoscaling scheduled actions triggered successfully but did nothing | AWS re:Post
AWS AutoScaling group created many more instances than the configured max capacity. Why? | AWS re:Post
EC2 Autoscaling group launching more instances than required | AWS re:Post
Dynamic scaling for Amazon EC2 Auto Scaling - Amazon EC2 Auto Scaling
Relevant content
- asked 2 years ago
- asked a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 5 months ago