Hello all,
I have an async endpoint running a model on SageMaker. I want my endpoint to be on for an hour (±3600 segs), and if after that time there's no activity i want to decrease the number of instances to zero to reduce costs.
This is what i have made:
# Configure Autoscaling on asynchronous endpoint down to zero instances
response = asg_client.register_scalable_target(
ServiceNamespace="sagemaker",
ResourceId=resource_id,
ScalableDimension="sagemaker:variant:DesiredInstanceCount",
MinCapacity=0,
MaxCapacity=1,
)
response = asg_client.put_scaling_policy(
PolicyName=f'Request-ScalingPolicy-name-endpoint-sagemaker',
ServiceNamespace="sagemaker",
ResourceId=resource_id,
ScalableDimension="sagemaker:variant:DesiredInstanceCount",
PolicyType="TargetTrackingScaling",
TargetTrackingScalingPolicyConfiguration={
"TargetValue": 0.5,
"CustomizedMetricSpecification": {
"MetricName": "ApproximateBacklogSizePerInstance",
"Namespace": "AWS/SageMaker",
"Dimensions": [{"Name": "EndpointName", "Value": "name-endpoint-sagemaker"}],
"Statistic": "Average",
},
"ScaleInCooldown": 3600, # duration until scale in begins (down to zero). In seconds.
"ScaleOutCooldown": 300 # duration between scale out attempts. In seconds.
}
)
The problem i am facing is that besides I put 3600 as the ScaleInCooldown property, i am still getting my endpoint to go offline after 15 mins of inactivity based on the CloudWatch alarm is created.
How can i change this behavior to go offline after one hour of inactivity? Why the alarm is created like that if i defined something different in the code?
Thanks in advance.