Sagemaker Application Autoscaling policy doesn't scale in


Hi there,

I am working with Sagemaker Real-Time Endpoints and application autoscaling policies. I can make my endpoint scale out successfully, but I can't get to work the scale in part. Below you can find the details of my setup.

I registered the scalable target via:

aws application-autoscaling register-scalable-target \
    --service-namespace sagemaker \
    --resource-id endpoint/MY_ENDPOINT_NAME/variant/AllTraffic \
    --scalable-dimension sagemaker:variant:DesiredInstanceCount \
    --min-capacity 1 \
    --max-capacity 2

I created the json file for my target tracking scaling policy as:

        "TargetValue": 90000.0,
        "CustomizedMetricSpecification": {
            "MetricName": "ModelLatency",
            "Namespace": "AWS/SageMaker",
            "Dimensions": [{"Name": "EndpointName", "Value": "MY_ENDPOINT_NAME"},
                          {"Name": "VariantName", "Value": "AllTraffic"}],
            "Statistic": "Average",
            "Unit": "Microseconds"
        "ScaleInCooldown": 120,
        "ScaleOutCooldown": 120,
	"DisableScaleIn": false

And then successfully applied to my endpoint via:

aws application-autoscaling put-scaling-policy \
    --service-namespace sagemaker \
    --policy-name MY_POLICY_NAME \
    --resource-id endpoint/MY_ENDPOINT_NAME/variant/AllTraffic \
    --scalable-dimension sagemaker:variant:DesiredInstanceCount \
    --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration file://sagemaker_target_tracking_latency.json

This creates 2 alarms in Cloudwatch as per the image below.: Enter image description here

The endpoint scales out from 1 to 2 when the High alarm goes off: Enter image description here

When Low alarm goes off instead, nothing happens. However in Cloudwatch I can see the triggered action: Enter image description here

When I try to describe the scaling activities, I can't see anything related to the Low alarm setting the instance count to 1 though. Also in CloudTrail there's no mention of the scale in activity. I am running out of things I can check. Could anyone help out here?

1 Answer

Can you run this and check the output? Cloudtrail only logs API calls, not internal activities of other AWS services: aws application-autoscaling describe-scaling-activities --include-not-scaled-activities --service-namespace sagemaker --resource-id <YourId>

the --include-not-scaled-activities will give info on if autoscaling chose not to scale-in for some reason. Info on the response codes here:

EDIT: Reading the exact policy config again, I see its configured with a custom metric for ModelLatency. Latency isn't usually a good metric for target tracking, because it doesn't change proportionally to the desired capacity (but target tracking is built assuming the metric DOES change proportionally with the metric). Example of a good metric: CPU will roughly double if you half the number of instances - there's a proportional relationship between the metric and the Capacity If the number of Sagemaker endpoints doubles, there's no telling what that will do to latency

answered 10 days ago
  • Same result as described above unfortunately.

  • That's weird, if the Alarm went into the ALARM state (which we see it does from all the details you provided) then AutoScaling would have evaluated if it should scale or not. Most of the common reasons for scaling not happening get logged in the activity history when including the --include-not-scaled-activities flag. It does de-dupe; is the most recent activity (even if from a while ago) a failure? If so, that same failure reason might still be recurring. See above edit for more details

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions