Sagemaker step scaling policy

0

I'm trying to define a step scaling policy for my sagemaker realtime endpoint, based on this example notebook. I understand that the step scaling policy defines thresholds to provision a different amount of instances, but I am confused because it doesn't seem to specify the metrics to track.

In the example notebook, the code is the following:

def set_step_scaling(endpoint_name, variant_name):
    policy_name = "step-scaling-{}".format(str(round(time.time())))
    resource_id = "endpoint/{}/variant/{}".format(endpoint_name, variant_name)

    response = aas_client.put_scaling_policy(
        PolicyName=policy_name,
        ServiceNamespace="sagemaker",
        ResourceId=resource_id,
        ScalableDimension="sagemaker:variant:DesiredInstanceCount",
        PolicyType="StepScaling",
        StepScalingPolicyConfiguration={
            "AdjustmentType": "ChangeInCapacity",
            "StepAdjustments": [
                {
                    "MetricIntervalLowerBound": 0.0,
                    "MetricIntervalUpperBound": 5.0,
                    "ScalingAdjustment": 1,
                },
                {
                    "MetricIntervalLowerBound": 5.0,
                    "MetricIntervalUpperBound": 80.0,
                    "ScalingAdjustment": 3,
                },
                {
                    "MetricIntervalLowerBound": 80.0, 
                    "ScalingAdjustment": 4
                },
            ],
            "MetricAggregationType": "Average",
        },
    )

    return policy_name, response

I want it to track the metric SageMakerVariantInvocationsPerInstance. Can I specify it? If so, the metric aggregation type shouldn't it be Sum instead of average? I'm quite confused, and I would appreciate a lot your help! Thanks a lot!

asked a year ago225 views
2 Answers
0
Accepted Answer

With step scaling, the alarm isn't owned or controlled by AutoScaling (unlike target tracking, where AutoScaling manages the alarms for the policy).

So for each step scaling policy (you need one for scale-out and another one for scale-in), you'll need to create an alarm on your own, and create an alarm action pointing to your step scaling policy. This means step scaling is considerably more flexible and configurable than Target Tracking, but the trade off is that its much more work to configure it (especially via code/IaC).

AWS
answered a year ago
  • Wonderful, thank you so much for your help! Appreciate a lot :)

0

In this example notebook, two scaling policies are being used, as shown in the code excerpt from the notebook.

invocation_scaling = set_target_scaling_on_invocation(
    endpoint_name=endpoint_name,
    variant_name=variant_name,
    target_value=invocations_per_instance * 1.3,
)

cpu_scaling = set_target_scaling_on_cpu_utilization(
    endpoint_name=endpoint_name, variant_name=variant_name, target_value=100
)

As you can see above the metrics are applied to the endpoint name and variant name and include a target threshold value. You will also notice that we are using pre-defined metrics based on the following import statements.

from endpoint_scaling import set_target_scaling_on_invocation
from endpoint_scaling import set_target_scaling_on_cpu_utilization

You can use SageMakerVariantInvocationsPerInstance which is a predefined CloudWatch metric that specifies the average number of times per minute that each instance for a variant is invoked. You an implement it in the put_scaling_policy function as shown in the PredefinedMetricSpecification below:

response = client.put_scaling_policy(
    PolicyName='Invocations-ScalingPolicy',
    ServiceNamespace='sagemaker', # The namespace of the AWS service that provides the resource. 
    ResourceId=resource_id, # Endpoint name 
    ScalableDimension='sagemaker:variant:DesiredInstanceCount', # SageMaker supports only Instance Count
    PolicyType='TargetTrackingScaling', # 'StepScaling'|'TargetTrackingScaling'
    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 4000, # The target value for the metric: ApproximateBacklogSizePerInstance. 
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'SageMakerVariantInvocationsPerInstance', 
        },
        'ScaleInCooldown': <replace with number of seconds>, 
        'ScaleOutCooldown': <replace with number of seconds>
        
        'DisableScaleIn': False, # Indicates whether scale in by the target tracking policy is disabled.
    }
)
AWS
answered a year ago
  • Thanks a lot for your answer! Appreciate the time in answering my question!

    The TargetTrackingPolicy is clear, and I was rather asking for the StepScaling policy. How can I apply it so that it tracks the InvocationPerInstance metric? With further research, I think I understood that I have to create a step scaling policy, and then associate it manually to a Cloud Watch alarm I create myself. Is this correct, and best practice?

    Thanks a lot once again! Really appreciate it!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions