AWS ECS uses Fargate service auto-scaling, which is not working properly

0

As the title says, I created a service in AWS ECS using Fargate.

But when I added auto-scaling to the AWS ECS service, scale-in failed. I have confirmed the following points.

  1. CloudWarch successfully sent the event
  2. I used the aws-cli describe-scaling-activities command to see why scale-in failed, but I did not see any records of attempted scale-in.

I initially used Terraform to create AWS ECS service auto-scaling, but then I removed it directly from Terraform to make auto-scaling. I also turned it on manually in the GUI without any records.

I want to ask if anyone has encountered similar problems?

2 Answers
0

For 2), did you include the --include-not-scaled-activities flag? If not, can you re-run the command with that included?

What scaling policy(s) are configured for this ECS service? Can you share the config(s)?

Some of the most common reasons for a scale-in being blocked after the associated cloudwatch alarm triggered are: https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-scaling-activities.html#understand-not-scaled-reason-codes

AWS
answered 7 months ago
    1. Yes, I have checked using --include-not-scaled-activities

    Below is my auto-scaling configuration type: Target tracking metric: ECSServiceAverageMemoryUtilization target_value: 70

    btw my service is about 55% during off-peak hours (avg)

  • the outputs of --include-not-scaled-activites are de-duped, so if there's another failed scale-in previously, it might be the same reason. There's a lot of general reasons that target tracking can choose to not scale-in if it doesn't think its safe to do so. Some reasons scale-in might not happen that wouldn't get logged in that call:

    1. Suspended Processes on the scalable target (describe-scalable-targets to check)
    2. The policy set to 'disable scale-in'
    3. Alarm never went into the ALARM state (see Jeff's answer for more details)
0

I recently configured our ECS Fargate services with auto-scaling. One of the things I found helpful to understand this is to look for the CloudWatch alarms that were created as part of the auto-scaling policy. Each auto-scaling policy should have a pair of alarms, named something like:

  • TargetTracking-service/<cluster name>/<service name>-AlarmHigh-<some-id>
  • TargetTracking-service/<cluster name>/<service name>-AlarmLow-<some-id>

The Cloudwatch page for the alarm has a History tab showing when each alarm goes in and out of alarm state. In general, if the "AlarmHigh" alarm goes into alarm status you should see a corresponding activity in describe-scaling-activities --include-not-scaled (or an actual scaling activity). For the services we've configured, all of the "AlarmHigh" require three consecutive minutes of the metric above the alarm threshold (average memory utilization > 70% in your case) to maybe start a scale-out operation, and the "AlarmLow" monitors require fifteen consecutive minutes below 90% of the threshold to maybe start a scale-in operation.

In my (relatively brief) experience, the main reason I see for services not to scale is that when AutoScaling looks at the CloudWatch data, it appears to do its own extrapolation of which direction the metric is trending, but it seems to focus too much on the three datapoints in alarm. If you application memory utilization spikes up to 90% for two data points, but the third data point is 75%, auto-scaling seems to take that as an indicator that the memory utilization is trending back down, so scaling up is not necessary. Sometimes, this trend analysis allows auto-scaling to detect that a metric has increased rapidly, so it auto-scales more quickly than if it weren't doing any extrapolation, but it seems to me that auto-scaling should always scale up by some amount if the metric ever goes into alarm.

-- Jeff

answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions