- Newest
- Most votes
- Most comments
Hello, I'm running into the exact same issue. I used the same guide and the async endpoint doesn't scale up or down.
Hi, hope you are fine. Thanks for getting back to me. This is what I am using:
# Configure Autoscaling on asynchronous endpoint down to zero instances
response = client.register_scalable_target(
ServiceNamespace="sagemaker",
ResourceId=resource_id,
ScalableDimension="sagemaker:variant:DesiredInstanceCount",
MinCapacity=0,
MaxCapacity=4,
)
response = client.put_scaling_policy(
PolicyName="Invocations-ScalingPolicy",
ServiceNamespace="sagemaker", # The namespace of the AWS service that provides the resource.
ResourceId=resource_id, # Endpoint name
ScalableDimension="sagemaker:variant:DesiredInstanceCount", # SageMaker supports only Instance Count
PolicyType="TargetTrackingScaling", # 'StepScaling'|'TargetTrackingScaling'
TargetTrackingScalingPolicyConfiguration={
"TargetValue": 2.0, # The target value for the metric. - here the metric is - SageMakerVariantInvocationsPerInstance
"CustomizedMetricSpecification": {
"MetricName": "ApproximateBacklogSizePerInstance",
"Namespace": "AWS/SageMaker",
"Dimensions": [{"Name": "EndpointName", "Value": endpoint_name}],
"Statistic": "Average",
},
"ScaleInCooldown": 300, # The cooldown period helps you prevent your Auto Scaling group from launching or terminating
# additional instances before the effects of previous activities are visible.
# You can configure the length of time based on your instance startup time or other application needs.
# ScaleInCooldown - The amount of time, in seconds, after a scale in activity completes before another scale in activity can start.
"ScaleOutCooldown": 300 # ScaleOutCooldown - The amount of time, in seconds, after a scale out activity completes before another scale out activity can start.
# 'DisableScaleIn': True|False - ndicates whether scale in by the target tracking policy is disabled.
# If the value is true , scale in is disabled and the target tracking policy won't remove capacity from the scalable resource.
},
)
I think ScaleOutCooldown period is logical here. Also I even tried out using 40 invocations, but even then I failed to scale the endpoint up from 0.
was it a sustained load test? The reason I ask is because you have set high number for
ScaleInCooldown
andScaleInCooldown
i.e 5mins. if your tests are not sustained and consistently invoking the endpoint, it won't scale up. Maybe for your tests, set them to a lower value for you to visualize autoscaling. Ref notebook - https://github.com/aws/amazon-sagemaker-examples/blob/main/async-inference/Async-Inference-Walkthrough.ipynbHi, thanks for helping me out. I don't get what you mean by sustained load. Moreover, thelik you provided, I followed that to reach my solution. Also, reason for high number is, that my pipeline requires heavy installation, and downloads, and it takes almost 7-8 mins to have an instance in running condition. So that's why I did so. Also, I waited for more than even 20-30 mins, but endpoint didn't scales up from 0 to 1 instance
Thanks
Relevant content
- asked 4 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 2 years ago
Can you please share your Async Endpoint config and your autoscaling policy? The reason it may not have scaled up could be due to the higher
ScaleOutCooldown
times andTargetTrackingScalingPolicyConfiguration
. For instance, if you have set the Target Tracking policy to scale based onApproximateBacklogSizePerInstance
and a lower number and if there are not enough requests in the Backlog, the autoscaling policy will not be triggereddid you solve this?