Skip to content

ECS Service Auto Scaling - Scale In Not Working

0

Hi, I've configured ECS cluster auto scaling and ECS service auto scaling and need some guidance with the service scaling behavior. Cluster Auto Scaling Configuration:

managed_scaling { maximum_scaling_step_size = 1000 minimum_scaling_step_size = 1 status = "ENABLED" target_capacity = 80 }

Auto Scaling Group (ASG):

  • Min: 2
  • Max: 4
  • Desired: 2

I've set up an ECS cluster with EC2 launch type using t3.medium instances. Each instance is running approximately 6 tasks. Cluster auto scaling is working well — it increases or decreases the instance count based on the target tracking policy.

For service deployment, I'm using strategies such as binpack based on CPU and memory. Service Auto Scaling Configuration:

  • Metric: ECSServiceAverageMemoryUtilization
    
  • Target value: 85%
    
  • Scale-out cooldown: 120 seconds
    
  • Scale-in cooldown: 180 seconds
    
  • Disable scale-in: False
    

ASG for Service Auto Scaling:

  • Min: 1
    
  • Max: 2
    
  • Desired: 1 (to be managed automatically based on target tracking)
    

Issue: With this configuration, the service scales out correctly when the memory utilization crosses the threshold. However, scale-in does not occur, even when the memory usage drops below the target and alarm status is IN alarm all the datapoints are satisfied. There are no logs or scaling activity indicating why scale-in is not triggered.

I'm not sure what I'm missing or if this is a known issue. Has anyone faced a similar problem? Any guidance or documentation references would be greatly appreciated.

2 Answers
1

There are several potential reasons why your ECS service auto scaling is not scaling in despite the alarm being in the "In Alarm" state:

  1. Safety mechanism: Application Auto Scaling has a built-in safety feature that prevents scale-in actions if it calculates that removing a task would immediately cause the metric to exceed the target value again. With a small service (min: 1, max: 2), removing one task might push the memory utilization back over the 85% threshold, causing the system to avoid scaling in.

  2. Cooldown periods: Your scale-in cooldown period is set to 180 seconds. Make sure enough time has passed since the last scaling activity before expecting a scale-in action.

  3. Metric calculation: The system might be determining that the current load, even though below the threshold, doesn't warrant removing a task based on its internal calculations of how that would affect the remaining infrastructure.

To troubleshoot and potentially resolve this issue:

  1. Check scaling activities: Use the AWS CLI to check why scale-in didn't happen by reviewing the scaling activities, including activities that didn't result in scaling.

  2. Consider alternative scaling approaches: If target tracking isn't working as expected, you might need to:

  • Use step scaling instead (though this may result in a less stable environment)
  • Change the target tracking metric to one that better represents when scale-in is safe
  • Add another policy tracking a different metric while disabling scale-in on the memory policy
  1. Review your binpack strategy: Your deployment strategy using binpack for CPU and memory might be affecting how tasks are distributed, potentially making it harder for the system to determine when scale-in is appropriate.

  2. Adjust target capacity: Your cluster auto scaling target capacity is set to 80%, which might interact with your service scaling decisions. You might need to adjust this value to better align with your service scaling needs.
    Sources
    AWS ECS EC2-backed auto scaling scale-in is not working | AWS re:Post
    Optimize Amazon ECS service auto scaling - Amazon Elastic Container Service
    Automatically manage Amazon ECS capacity with cluster auto scaling - Amazon Elastic Container Service

answered a year ago
EXPERT
reviewed a year ago
0

You mentioned - "However, scale-in does not occur, even when the memory usage drops below the target and alarm status is IN alarm all the datapoints are satisfied". However, if the status is IN ALARM, it will not scale in as it still in alarm.

Do you have the CPU/ memory metrics for the EC2 or only the ECS. It's possible the cpu/memory is still high even though the ECS has cooled down. Also, the scale in will take effect after the time specified.

EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.