how to set up autoscaling for async sagemaker endpoint?

0

working with an example documented here -> https://github.com/aws/amazon-sagemaker-examples/blob/main/async-inference/Async-Inference-Walkthrough.ipynb. I was able to set up the sagemaker model, config and aync endpoint via lambda, now I'm trying to re-create the stack via terraform. based on the documentation on terraform, i was able to set up the model, config and the endpoint but couldn't find how to go about setting up the auto scaling ( sample code below). is this possible?

client = boto3.client(    "application-autoscaling") 
resource_id = (    "endpoint/" + endpoint_name + "/variant/" + "variant1")  
response = client.register_scalable_target(
    ServiceNamespace="sagemaker",
    ResourceId=resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    MinCapacity=0,
    MaxCapacity=5,
)
response = client.put_scaling_policy(
    PolicyName="Invocations-ScalingPolicy",
    ServiceNamespace="sagemaker", 
    ResourceId=resource_id,  # Endpoint name
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",  
    PolicyType="TargetTrackingScaling",  # 'StepScaling'|'TargetTrackingScaling'
    TargetTrackingScalingPolicyConfiguration={
        "TargetValue": 5.0,  
SageMakerVariantInvocationsPerInstance
        "CustomizedMetricSpecification": {
            "MetricName": "ApproximateBacklogSizePerInstance",
            "Namespace": "AWS/SageMaker",
            "Dimensions": [{"Name": "EndpointName", "Value": endpoint_name}],
            "Statistic": "Average",
        },
        "ScaleInCooldown": 600,
   ....
    },
)

clean up

response = client.deregister_scalable_target(
    ServiceNamespace='sagemaker',
    ResourceId='resource_id',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount'
)
  • Have you tried this yet? Did you get an error. This is the right approach.

  • @AWS-User-0823707 - yes. it works. I still have few more follow up questions regarding this. do you have any experience in this?

1 Answer
1

You will using the regular autoscaling config outlined in the doc here to configure it for the SageMaker Async endpoint. There are no specifics for SageMaker.

First, you define the "aws_appautoscaling_target" with minimum and maximum capacities. Then go ahead and define your "TargetTrackingScaling" in the autoscaling policy

AWS
answered 2 years ago
  • @AWS_Raghu - thanks this is helpful. one follow up questions , in the original link i provided, in the clean up section , it states that we have to deregister the endpoint as a scalable target before deleting it (I have update my question to add clean up sample code ), I am assuming this is also not sagemaker specific, so can this be done via terraform?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions