Sagemaker Autoscaling policy not working as expected

0

Hello all,

I have an async endpoint running a model on SageMaker. I want my endpoint to be on for an hour (±3600 segs), and if after that time there's no activity i want to decrease the number of instances to zero to reduce costs.

This is what i have made:

# Configure Autoscaling on asynchronous endpoint down to zero instances
response = asg_client.register_scalable_target(
    ServiceNamespace="sagemaker",
    ResourceId=resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    MinCapacity=0,
    MaxCapacity=1,
)

response = asg_client.put_scaling_policy(
    PolicyName=f'Request-ScalingPolicy-name-endpoint-sagemaker',
    ServiceNamespace="sagemaker",  
    ResourceId=resource_id, 
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    PolicyType="TargetTrackingScaling",
    TargetTrackingScalingPolicyConfiguration={
        "TargetValue": 0.5, 
        "CustomizedMetricSpecification": {
            "MetricName": "ApproximateBacklogSizePerInstance",
            "Namespace": "AWS/SageMaker",
            "Dimensions": [{"Name": "EndpointName", "Value": "name-endpoint-sagemaker"}],
            "Statistic": "Average",
        },
        "ScaleInCooldown": 3600, # duration until scale in begins (down to zero). In seconds.
        "ScaleOutCooldown": 300 # duration between scale out attempts. In seconds.
    }
)

The problem i am facing is that besides I put 3600 as the ScaleInCooldown property, i am still getting my endpoint to go offline after 15 mins of inactivity based on the CloudWatch alarm is created.

How can i change this behavior to go offline after one hour of inactivity? Why the alarm is created like that if i defined something different in the code?

Thanks in advance.

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions