Sagemaker Application Autoscaling policy doesn't scale in

0

Hi there,

I am working with Sagemaker Real-Time Endpoints and application autoscaling policies. I can make my endpoint scale out successfully, but I can't get to work the scale in part. Below you can find the details of my setup.

I registered the scalable target via:

aws application-autoscaling register-scalable-target \
    --service-namespace sagemaker \
    --resource-id endpoint/MY_ENDPOINT_NAME/variant/AllTraffic \
    --scalable-dimension sagemaker:variant:DesiredInstanceCount \
    --min-capacity 1 \
    --max-capacity 2

I created the json file for my target tracking scaling policy as:

{
        "TargetValue": 90000.0,
        "CustomizedMetricSpecification": {
            "MetricName": "ModelLatency",
            "Namespace": "AWS/SageMaker",
            "Dimensions": [{"Name": "EndpointName", "Value": "MY_ENDPOINT_NAME"},
                          {"Name": "VariantName", "Value": "AllTraffic"}],
            "Statistic": "Average",
            "Unit": "Microseconds"
        },
        "ScaleInCooldown": 120,
        "ScaleOutCooldown": 120,
	"DisableScaleIn": false
}

And then successfully applied to my endpoint via:

aws application-autoscaling put-scaling-policy \
    --service-namespace sagemaker \
    --policy-name MY_POLICY_NAME \
    --resource-id endpoint/MY_ENDPOINT_NAME/variant/AllTraffic \
    --scalable-dimension sagemaker:variant:DesiredInstanceCount \
    --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration file://sagemaker_target_tracking_latency.json

This creates 2 alarms in Cloudwatch as per the image below.: Enter image description here

The endpoint scales out from 1 to 2 when the High alarm goes off: Enter image description here

When Low alarm goes off instead, nothing happens. However in Cloudwatch I can see the triggered action: Enter image description here

When I try to describe the scaling activities, I can't see anything related to the Low alarm setting the instance count to 1 though. Also in CloudTrail there's no mention of the scale in activity. I am running out of things I can check. Could anyone help out here?

1개 답변
0

Can you run this and check the output? Cloudtrail only logs API calls, not internal activities of other AWS services: aws application-autoscaling describe-scaling-activities --include-not-scaled-activities --service-namespace sagemaker --resource-id <YourId>

the --include-not-scaled-activities will give info on if autoscaling chose not to scale-in for some reason. Info on the response codes here: https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-scaling-activities.html#include-not-scaled-activities-with-the-aws-cli

EDIT: Reading the exact policy config again, I see its configured with a custom metric for ModelLatency. Latency isn't usually a good metric for target tracking, because it doesn't change proportionally to the desired capacity (but target tracking is built assuming the metric DOES change proportionally with the metric). Example of a good metric: CPU will roughly double if you half the number of instances - there's a proportional relationship between the metric and the Capacity If the number of Sagemaker endpoints doubles, there's no telling what that will do to latency https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-target-tracking.html#target-tracking-considerations

AWS
답변함 8달 전
  • Same result as described above unfortunately.

  • That's weird, if the Alarm went into the ALARM state (which we see it does from all the details you provided) then AutoScaling would have evaluated if it should scale or not. Most of the common reasons for scaling not happening get logged in the activity history when including the --include-not-scaled-activities flag. It does de-dupe; is the most recent activity (even if from a while ago) a failure? If so, that same failure reason might still be recurring. See above edit for more details

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠