I have an aws asynchronous endpoint with auto scaling attached to it. The expected behavior is to scale in the instance count to zero when there are no requests and scale up when at least one request comes. Its working as expected. However when I check the cloud watch alarms , I could see that the alarm for scale in remain in the state of 'in Alarm' even after the instance count reduces to zero. Is this the normal behavior? How to make its state ok once the instance count reaches 0? I doubt this is delaying the scale out action when a new request comes.
My autoscaling configuration is as follows:
# application-autoscaling client
asg_client = boto3.client("application-autoscaling")
# This is the format in which application autoscaling references the endpoint
resource_id = f"endpoint/{async_predictor.endpoint_name}/variant/AllTraffic"
# Configure Autoscaling on asynchronous endpoint down to zero instances
response = asg_client.register_scalable_target(
ServiceNamespace="sagemaker",
ResourceId=resource_id,
ScalableDimension="sagemaker:variant:DesiredInstanceCount",
MinCapacity=0,
MaxCapacity=5,
)
response = asg_client.put_scaling_policy(
PolicyName = f'HasBacklogWithoutCapacity-ScalingPolicy-{async_predictor.endpoint_name}',
ServiceNamespace="sagemaker", # The namespace of the service that provides the resource.
ResourceId=resource_id, # Endpoint name
ScalableDimension="sagemaker:variant:DesiredInstanceCount", # SageMaker supports only Instance Count
PolicyType="StepScaling", # 'StepScaling' or 'TargetTrackingScaling'
StepScalingPolicyConfiguration={
"AdjustmentType": "ChangeInCapacity", # Specifies whether the ScalingAdjustment value in the StepAdjustment property is an absolute number or a percentage of the current capacity.
"MetricAggregationType": "Average", # The aggregation type for the CloudWatch metrics.
"Cooldown": 300, # The amount of time, in seconds, to wait for a previous scaling activity to take effect.
"StepAdjustments": # A set of adjustments that enable you to scale based on the size of the alarm breach.
[
{
"MetricIntervalLowerBound": 0,
"ScalingAdjustment": 1
}
]
},
)
cw_client = boto3.client('cloudwatch')
step_scaling_policy_arn = response['PolicyARN']
response = cw_client.put_metric_alarm(
AlarmName=f'step_scaling_policy_alarm_name-{async_predictor.endpoint_name}',
MetricName='HasBacklogWithoutCapacity',
Namespace='AWS/SageMaker',
Statistic='Average',
EvaluationPeriods= 2,
DatapointsToAlarm= 2,
Threshold= 1,
ComparisonOperator='GreaterThanOrEqualToThreshold',
TreatMissingData='missing',
Dimensions=[
{ 'Name':'EndpointName', 'Value':async_predictor.endpoint_name },
],
Period= 60,
AlarmActions=[step_scaling_policy_arn]
)
response_scalein = asg_client.put_scaling_policy(
PolicyName = f'scaleinpolicy-{async_predictor.endpoint_name}',
ServiceNamespace="sagemaker", # The namespace of the service that provides the resource.
ResourceId=resource_id, # Endpoint name
ScalableDimension="sagemaker:variant:DesiredInstanceCount", # SageMaker supports only Instance Count
PolicyType="StepScaling", # 'StepScaling' or 'TargetTrackingScaling'
StepScalingPolicyConfiguration={
"AdjustmentType": "ChangeInCapacity", # Specifies whether the ScalingAdjustment value in the StepAdjustment property is an absolute number or a percentage of the current capacity.
"MetricAggregationType": "Average", # The aggregation type for the CloudWatch metrics.
"Cooldown": 300, # The amount of time, in seconds, to wait for a previous scaling activity to take effect.
"StepAdjustments": # A set of adjustments that enable you to scale based on the size of the alarm breach.
[
{
"MetricIntervalUpperBound": 0,
"ScalingAdjustment": -1
}
]
},
)
cw_client = boto3.client('cloudwatch')
stepin_scaling_policy_arn = response_scalein['PolicyARN']
response = cw_client.put_metric_alarm(
AlarmName=f'step_scale-in_policy-{async_predictor.endpoint_name}',
MetricName='ApproximateBacklogSizePerInstance',
Namespace='AWS/SageMaker',
Statistic='Average',
EvaluationPeriods= 2,
DatapointsToAlarm= 2,
Threshold= 0.5,
ComparisonOperator='LessThanOrEqualToThreshold',
TreatMissingData='missing',
Dimensions=[
{ 'Name':'EndpointName', 'Value':async_predictor.endpoint_name },
],
Period= 60,
AlarmActions=[stepin_scaling_policy_arn]
)
Thank you for the answer. So in that case , when a new request is coming even after the cooldown period, the cloudwatch alarm that monitors the backlog size first changes it state from 'in alarm' to 'ok'. After that only, the alarm that is monitoring hasbacklogwithoutcapacity changes from 'ok' state to 'in alarm.' Then only the scale up action occurs. This is creating a delay in scaling up. Is there any work around to give priority for the scaling up alarm
Changing the cooldown shouldn't be needed here. A scale-in cooldown doesn't block a scale-out action: https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-step-scaling-policies.html#step-scaling-cooldown
The 2 alarms aren't connected to each other, the scale-out alarm doesn't have to wait for the scale-in alarm to change states. However, the conditions for the scale-out alarm have to be met before it can change to 'ALARM' state. Based on your alarm settings, that means waiting for 2 consecutive breaching minutes