how to set up autoscaling for async sagemaker endpoint?

0

working with an example documented here -> https://github.com/aws/amazon-sagemaker-examples/blob/main/async-inference/Async-Inference-Walkthrough.ipynb. I was able to set up the sagemaker model, config and aync endpoint via lambda, now I'm trying to re-create the stack via terraform. based on the documentation on terraform, i was able to set up the model, config and the endpoint but couldn't find how to go about setting up the auto scaling ( sample code below). is this possible?

client = boto3.client(    "application-autoscaling") 
resource_id = (    "endpoint/" + endpoint_name + "/variant/" + "variant1")  
response = client.register_scalable_target(
    ServiceNamespace="sagemaker",
    ResourceId=resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    MinCapacity=0,
    MaxCapacity=5,
)
response = client.put_scaling_policy(
    PolicyName="Invocations-ScalingPolicy",
    ServiceNamespace="sagemaker", 
    ResourceId=resource_id,  # Endpoint name
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",  
    PolicyType="TargetTrackingScaling",  # 'StepScaling'|'TargetTrackingScaling'
    TargetTrackingScalingPolicyConfiguration={
        "TargetValue": 5.0,  
SageMakerVariantInvocationsPerInstance
        "CustomizedMetricSpecification": {
            "MetricName": "ApproximateBacklogSizePerInstance",
            "Namespace": "AWS/SageMaker",
            "Dimensions": [{"Name": "EndpointName", "Value": endpoint_name}],
            "Statistic": "Average",
        },
        "ScaleInCooldown": 600,
   ....
    },
)

clean up

response = client.deregister_scalable_target(
    ServiceNamespace='sagemaker',
    ResourceId='resource_id',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount'
)
  • Have you tried this yet? Did you get an error. This is the right approach.

  • @AWS-User-0823707 - yes. it works. I still have few more follow up questions regarding this. do you have any experience in this?

1 réponse
1

You will using the regular autoscaling config outlined in the doc here to configure it for the SageMaker Async endpoint. There are no specifics for SageMaker.

First, you define the "aws_appautoscaling_target" with minimum and maximum capacities. Then go ahead and define your "TargetTrackingScaling" in the autoscaling policy

AWS
répondu il y a 2 ans
  • @AWS_Raghu - thanks this is helpful. one follow up questions , in the original link i provided, in the clean up section , it states that we have to deregister the endpoint as a scalable target before deleting it (I have update my question to add clean up sample code ), I am assuming this is also not sagemaker specific, so can this be done via terraform?

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions