Sagemaker Autoscaling policy not working as expected

0

Hello all,

I have an async endpoint running a model on SageMaker. I want my endpoint to be on for an hour (±3600 segs), and if after that time there's no activity i want to decrease the number of instances to zero to reduce costs.

This is what i have made:

# Configure Autoscaling on asynchronous endpoint down to zero instances
response = asg_client.register_scalable_target(
    ServiceNamespace="sagemaker",
    ResourceId=resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    MinCapacity=0,
    MaxCapacity=1,
)

response = asg_client.put_scaling_policy(
    PolicyName=f'Request-ScalingPolicy-name-endpoint-sagemaker',
    ServiceNamespace="sagemaker",  
    ResourceId=resource_id, 
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    PolicyType="TargetTrackingScaling",
    TargetTrackingScalingPolicyConfiguration={
        "TargetValue": 0.5, 
        "CustomizedMetricSpecification": {
            "MetricName": "ApproximateBacklogSizePerInstance",
            "Namespace": "AWS/SageMaker",
            "Dimensions": [{"Name": "EndpointName", "Value": "name-endpoint-sagemaker"}],
            "Statistic": "Average",
        },
        "ScaleInCooldown": 3600, # duration until scale in begins (down to zero). In seconds.
        "ScaleOutCooldown": 300 # duration between scale out attempts. In seconds.
    }
)

The problem i am facing is that besides I put 3600 as the ScaleInCooldown property, i am still getting my endpoint to go offline after 15 mins of inactivity based on the CloudWatch alarm is created.

How can i change this behavior to go offline after one hour of inactivity? Why the alarm is created like that if i defined something different in the code?

Thanks in advance.

Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen