- Newest
- Most votes
- Most comments
You may need to check the box for "Turn off scale-in" on the Memory policy
Target tracking will scale-in conservatively. This means it will choose not to scale-in if it believes this action is unsafe and might immediately cause a scale-out. Just looking at the metrics, the Memory is 56% on 2 tasks. So in theory, if a scale-in happened going down to 1 task, this means there would be 112% usage on the one remaining task, which would cause a scale-out (and also crash the task). You can verify this by looking at the "not scaled reasons" in the Application AutoScaling activity history: https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-scaling-activities.html#include-not-scaled-activities-with-the-aws-cli
Because the Operating System doesn't free up memory immediately when it isn't actively being used, its likely that scale-in is actually safe, and the 1 remaining task after a scale-in would still be under your 90% threshold. This is why you would need to consider turning off 'scale-in' for the memory metric, because the metric value does not reflect the real active memory usage of the OS.
If you make this change, scale-in will happen whenever the CPU metric is low
The issue you're experiencing with ECS service auto-scaling not triggering scale-in actions despite metrics being under the policy thresholds is a common challenge. There are a few potential reasons and solutions to consider:
-
Conflicting policies: When you have multiple scaling policies (CPU and Memory in this case), scale-in only occurs when all policies agree to scale in. This prevents oscillation where one metric might trigger scale-out while another triggers scale-in. Ensure that both CPU and Memory utilization are consistently low for scale-in to occur.
-
Cooldown periods: Check your scale-in cooldown period. If it's set too high, it might prevent immediate scale-in actions even when conditions are met. Consider adjusting this value if necessary.
-
Minimum capacity: Verify that your current task count isn't already at the minimum capacity set for your service. If it is, no further scale-in will occur.
-
Task placement strategy: If you're using a binpack strategy, tasks might not redistribute optimally after a scale-in event. This can lead to tasks remaining on different instances instead of consolidating.
-
Capacity Provider settings: If you're using a Capacity Provider for your Auto Scaling Group, check if Managed Termination Protection is enabled. This feature prevents instances from being scaled in as long as there are tasks running on them.
To address these issues:
-
Use Target Tracking scaling policies instead of step scaling if possible. They're generally more effective at maintaining your desired utilization levels.
-
If using step scaling, ensure your alarm thresholds and scaling adjustments are appropriately configured.
-
Consider implementing a custom task rebalancer using a Lambda function triggered periodically by CloudWatch Events. This can help redistribute tasks more efficiently after scale-in events.
-
If using a Capacity Provider, try disabling Managed Termination Protection and enabling Managed Draining instead. Set the target capacity utilization to 100% to encourage better bin-packing.
-
Monitor the CapacityProviderReservation metric in CloudWatch to gain insights into your cluster's scaling behavior.
By addressing these points, you should be able to improve your ECS service's auto-scaling behavior, particularly for scale-in actions.
Sources
Issue with AWS ECS Auto-Scaling and Binpack Task Placement Strategy: Tasks Not Shifting Back After Scale-In | AWS re:Post
Autoscalling in ECS cluster service does not behave as expected causing target deregistration and service downtime | AWS re:Post
ECS Service auto scaling policy conflict the one to other policy | AWS re:Post
Amazon ECS EC2-backed auto scale-out and scale-in | AWS re:Post
Not help me alot
Relevant content
- AWS OFFICIALUpdated 5 months ago
Thanks for advice. Before I try your way. I test setting Scaling Policy Memory to 160% and It trigger reduce task to 1. I want to know do you have any document relate with your recommend "Just looking at the metrics, the Memory is 56% on 2 tasks. So in theory, if a scale-in happened going down to 1 task, this means there would be 112% usage on the one remaining task, which would cause a scale-out (and also crash the task)". I think it true but i cant find document about this.
Not as specifically, but its discussed in the 2nd to last bullet of this doc: https://docs.aws.amazon.com/autoscaling/application/userguide/target-tracking-scaling-policy-overview.html#target-tracking-considerations Starting with "You may see gaps between the target value..."