Sagemaker Autoscaling Delay

0

I am setting up autoscaling for a realtime inference endpoint in sagemaker. I set up a load test using locust, and by setting relatively high numbers (i.e: 100 users, with 10 user spawned per seconds) I can see on Cloudwatch the InvocationPerInstance metric to ramp up pretty quickly to 20 000. I set InvocationPerInstance to 'sum', with period of '1 minute' on Cloudwatch.

Then, I created an autoscaling policy for my endpoint, with the following settings:

  • SageMakerVariantInvocationsPerInstance → target value: 1500
  • Scale in cool down seconds → 100
  • Scale out cool down second → 10

By this, I would expect that in the moment in which (the sum) of InvocationPerInstance is greater than 1500, it would activate scale out. And it works, but with a significant delay, i.e. the metric is over 20 000 for more than 5 minutes before the scale out happens. It is even more delayed for scale in: when I stop the test, so with 0 InvocationPerInstance , only after more than 25 minutes the scale in happens.

See the graph below to see the delay in the scale out:

Enter image description here

Why is it so delayed, is this an expected behaviour? Am I doing something wrong in the way I calculate the metrics perhaps?

Thank you so much! Really appreciate your help and guidance!


EDIT: I checked cloudwatch alarms, and I can see that:

  • For scale out the threshold is → InvocationsPerInstance > 1500 for 3 datapoints within 3 minutes
  • For scale in the threhsold is → InvocationsPerInstance < 1350 for 15 datapoints within 15 minutes

So this appears to be the issue. Is there a way to change these minutes?

This is the way I add my policy:

def set_target_scaling_on_invocation(
    endpoint_name: str,
    variant_name: str,
    target_value: int,
    scale_out_cool_down: int = 10,
    scale_in_cool_down: int = 100,
) -> dict:
    """
    Set scaling target based on invocation per instance with cool-down periods

    Parameters
    ----------
    endpoint_name : str
        The name of the endpoint
    variant_name : str
        The name of the endpoint variant
    target_value : int
        The target value for scaling based on invocations per instance
    scale_out_cool_down : int, optional
        The cool-down period for scaling out in seconds, by default 10
    scale_in_cool_down : int, optional
        The cool-down period for scaling in in seconds, by default 100

    Returns
    -------
    dict
        The policy name and the response from the scaling policy creation
    """
    policy_name = f"target-tracking-invocations-{round(time.time())}"
    resource_id = f"endpoint/{endpoint_name}/variant/{variant_name}"

    response = aas_client.put_scaling_policy(
        PolicyName=policy_name,
        ServiceNamespace="sagemaker",
        ResourceId=resource_id,
        ScalableDimension="sagemaker:variant:DesiredInstanceCount",
        PolicyType="TargetTrackingScaling",
        TargetTrackingScalingPolicyConfiguration={
            "TargetValue": target_value,
            "PredefinedMetricSpecification": {
                "PredefinedMetricType": "SageMakerVariantInvocationsPerInstance",
            },
            "ScaleOutCooldown": scale_out_cool_down,
            "ScaleInCooldown": scale_in_cool_down,
            "DisableScaleIn": False,
        },
    )

    return policy_name, response
1 Answer
0
Accepted Answer

Hello,

It looks like you're using a TargetTracking scaling policy. That's a managed policy, so you can't control the alarm settings. The alarm times are set to reduce churn when there's real world usage data which might be more spiky than your load tests. If you want control over the alarms, you'll need to use Step Scaling policies instead

Also keep in mind that Sagemaker doesn't publish 0 values for all its metrics, so a very sudden spike down to an actual 0 load can cause the Alarm not to trigger for scale-in

AWS
EXPERT
answered a year ago
  • Thank you so much! It's clear now, appreciate a lot!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions