how to set up autoscaling for async sagemaker endpoint?

0

working with an example documented here -> https://github.com/aws/amazon-sagemaker-examples/blob/main/async-inference/Async-Inference-Walkthrough.ipynb. I was able to set up the sagemaker model, config and aync endpoint via lambda, now I'm trying to re-create the stack via terraform. based on the documentation on terraform, i was able to set up the model, config and the endpoint but couldn't find how to go about setting up the auto scaling ( sample code below). is this possible?

client = boto3.client(    "application-autoscaling") 
resource_id = (    "endpoint/" + endpoint_name + "/variant/" + "variant1")  
response = client.register_scalable_target(
    ServiceNamespace="sagemaker",
    ResourceId=resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    MinCapacity=0,
    MaxCapacity=5,
)
response = client.put_scaling_policy(
    PolicyName="Invocations-ScalingPolicy",
    ServiceNamespace="sagemaker", 
    ResourceId=resource_id,  # Endpoint name
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",  
    PolicyType="TargetTrackingScaling",  # 'StepScaling'|'TargetTrackingScaling'
    TargetTrackingScalingPolicyConfiguration={
        "TargetValue": 5.0,  
SageMakerVariantInvocationsPerInstance
        "CustomizedMetricSpecification": {
            "MetricName": "ApproximateBacklogSizePerInstance",
            "Namespace": "AWS/SageMaker",
            "Dimensions": [{"Name": "EndpointName", "Value": endpoint_name}],
            "Statistic": "Average",
        },
        "ScaleInCooldown": 600,
   ....
    },
)

clean up

response = client.deregister_scalable_target(
    ServiceNamespace='sagemaker',
    ResourceId='resource_id',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount'
)
  • Have you tried this yet? Did you get an error. This is the right approach.

  • @AWS-User-0823707 - yes. it works. I still have few more follow up questions regarding this. do you have any experience in this?

已提問 2 年前檢視次數 1226 次
1 個回答
1

You will using the regular autoscaling config outlined in the doc here to configure it for the SageMaker Async endpoint. There are no specifics for SageMaker.

First, you define the "aws_appautoscaling_target" with minimum and maximum capacities. Then go ahead and define your "TargetTrackingScaling" in the autoscaling policy

AWS
已回答 2 年前
  • @AWS_Raghu - thanks this is helpful. one follow up questions , in the original link i provided, in the clean up section , it states that we have to deregister the endpoint as a scalable target before deleting it (I have update my question to add clean up sample code ), I am assuming this is also not sagemaker specific, so can this be done via terraform?

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南