cloud watch alarm continues in the state if 'In Alarm' even after the instances come down to zero.

0

I have an aws asynchronous endpoint with auto scaling attached to it. The expected behavior is to scale in the instance count to zero when there are no requests and scale up when at least one request comes. Its working as expected. However when I check the cloud watch alarms , I could see that the alarm for scale in remain in the state of 'in Alarm' even after the instance count reduces to zero. Is this the normal behavior? How to make its state ok once the instance count reaches 0? I doubt this is delaying the scale out action when a new request comes. My autoscaling configuration is as follows:

# application-autoscaling client
asg_client = boto3.client("application-autoscaling")

# This is the format in which application autoscaling references the endpoint
resource_id = f"endpoint/{async_predictor.endpoint_name}/variant/AllTraffic"

# Configure Autoscaling on asynchronous endpoint down to zero instances
response = asg_client.register_scalable_target(
    ServiceNamespace="sagemaker",
    ResourceId=resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    MinCapacity=0,
    MaxCapacity=5,
)

response = asg_client.put_scaling_policy(
    PolicyName = f'HasBacklogWithoutCapacity-ScalingPolicy-{async_predictor.endpoint_name}',
    ServiceNamespace="sagemaker",  # The namespace of the service that provides the resource.
    ResourceId=resource_id,  # Endpoint name
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",  # SageMaker supports only Instance Count
    PolicyType="StepScaling",  # 'StepScaling' or 'TargetTrackingScaling'
    StepScalingPolicyConfiguration={
        "AdjustmentType": "ChangeInCapacity", # Specifies whether the ScalingAdjustment value in the StepAdjustment property is an absolute number or a percentage of the current capacity. 
        "MetricAggregationType": "Average", # The aggregation type for the CloudWatch metrics.
        "Cooldown": 300, # The amount of time, in seconds, to wait for a previous scaling activity to take effect. 
        "StepAdjustments": # A set of adjustments that enable you to scale based on the size of the alarm breach.
        [ 
            {
              "MetricIntervalLowerBound": 0,
              "ScalingAdjustment": 1
            }
          ]
    },    
)

cw_client = boto3.client('cloudwatch')
step_scaling_policy_arn = response['PolicyARN']

response = cw_client.put_metric_alarm(
    AlarmName=f'step_scaling_policy_alarm_name-{async_predictor.endpoint_name}',
    MetricName='HasBacklogWithoutCapacity',
    Namespace='AWS/SageMaker',
    Statistic='Average',
    EvaluationPeriods= 2,
    DatapointsToAlarm= 2,
    Threshold= 1,
    ComparisonOperator='GreaterThanOrEqualToThreshold',
    TreatMissingData='missing',
    Dimensions=[
        { 'Name':'EndpointName', 'Value':async_predictor.endpoint_name },
    ],
    Period= 60,
    AlarmActions=[step_scaling_policy_arn]
)

response_scalein = asg_client.put_scaling_policy(
    PolicyName = f'scaleinpolicy-{async_predictor.endpoint_name}',
    ServiceNamespace="sagemaker",  # The namespace of the service that provides the resource.
    ResourceId=resource_id,  # Endpoint name
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",  # SageMaker supports only Instance Count
    PolicyType="StepScaling",  # 'StepScaling' or 'TargetTrackingScaling'
    StepScalingPolicyConfiguration={
        "AdjustmentType": "ChangeInCapacity", # Specifies whether the ScalingAdjustment value in the StepAdjustment property is an absolute number or a percentage of the current capacity. 
        "MetricAggregationType": "Average", # The aggregation type for the CloudWatch metrics.
        "Cooldown": 300, # The amount of time, in seconds, to wait for a previous scaling activity to take effect. 
        "StepAdjustments": # A set of adjustments that enable you to scale based on the size of the alarm breach.
        [ 
            {
              "MetricIntervalUpperBound": 0,
              "ScalingAdjustment": -1
            }
          ]
    },    
)

cw_client = boto3.client('cloudwatch')
stepin_scaling_policy_arn = response_scalein['PolicyARN']

response = cw_client.put_metric_alarm(
    AlarmName=f'step_scale-in_policy-{async_predictor.endpoint_name}',
    MetricName='ApproximateBacklogSizePerInstance',
    Namespace='AWS/SageMaker',
    Statistic='Average',
    EvaluationPeriods= 2,
    DatapointsToAlarm= 2,
    Threshold= 0.5,
    ComparisonOperator='LessThanOrEqualToThreshold',
    TreatMissingData='missing',
    Dimensions=[
        { 'Name':'EndpointName', 'Value':async_predictor.endpoint_name },
    ],
    Period= 60,
    AlarmActions=[stepin_scaling_policy_arn]
)
2 Answers
1

Hello,

Regarding

I doubt this is delaying the scale-out action when a new request comes If an alarm doesn't change states, it doesn't trigger Amazon EC2 Auto Scaling policies.

You would like to refer to https://repost.aws/knowledge-center/autoscaling-policy-cloudwatch-alarm

Also, check the Autoscaling tab for Details/Activity to verify if there is some inconsistency.

Evaluation periods, Threshold values, and Global Timeouts can also be checked as these factors can influence the CW Alarm State Change/Transition.

I see you're using StepScaling, it could be adding delays as the scaling event does not occur during the cooldown period. Try changing/adjusting the threshold or use TargetTracking to maintain Average CPU/other metrics as needed and see if that helps in faster transitions.

HTH!

profile picture
answered 10 months ago
0

Yes, this behavior is expected. The CloudWatch alarm will remain in the state of "In Alarm" until the alarm condition is no longer met, even if the instance count has reached zero. In this case, the alarm condition is based on the metric "ApproximateBacklogSizePerInstance", and if the value of this metric is less than or equal to the threshold of 0.5 for two consecutive evaluation periods, the alarm will transition to the "OK" state.

To avoid delaying the scale-out action when a new request comes, you can reduce the cooldown period in the scaling policies. Currently, the cooldown period is set to 300 seconds, which means that after a scaling activity, the Auto Scaling group will wait for 300 seconds before performing another scaling activity. You can reduce this cooldown period to a lower value, such as 60 seconds, to make the Auto Scaling group more responsive to changes in demand.

answered 10 months ago
  • Thank you for the answer. So in that case , when a new request is coming even after the cooldown period, the cloudwatch alarm that monitors the backlog size first changes it state from 'in alarm' to 'ok'. After that only, the alarm that is monitoring hasbacklogwithoutcapacity changes from 'ok' state to 'in alarm.' Then only the scale up action occurs. This is creating a delay in scaling up. Is there any work around to give priority for the scaling up alarm

  • Changing the cooldown shouldn't be needed here. A scale-in cooldown doesn't block a scale-out action: https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-step-scaling-policies.html#step-scaling-cooldown

    The 2 alarms aren't connected to each other, the scale-out alarm doesn't have to wait for the scale-in alarm to change states. However, the conditions for the scale-out alarm have to be met before it can change to 'ALARM' state. Based on your alarm settings, that means waiting for 2 consecutive breaching minutes

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions