how to set up autoscaling for async sagemaker endpoint?

Question

working with an example documented here -> https://github.com/aws/amazon-sagemaker-examples/blob/main/async-inference/Async-Inference-Walkthrough.ipynb. I was able to set up the sagemaker model, config and aync endpoint via lambda, now I'm trying to re-create the stack via terraform. based on the documentation on terraform, i was able to set up the model, config and the endpoint but couldn't find how to go about setting up the auto scaling ( sample code below). 
is this possible?

```
client = boto3.client(    "application-autoscaling") 
resource_id = (    "endpoint/" + endpoint_name + "/variant/" + "variant1")  
response = client.register_scalable_target(
    ServiceNamespace="sagemaker",
    ResourceId=resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    MinCapacity=0,
    MaxCapacity=5,
)
response = client.put_scaling_policy(
    PolicyName="Invocations-ScalingPolicy",
    ServiceNamespace="sagemaker", 
    ResourceId=resource_id,  # Endpoint name
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",  
    PolicyType="TargetTrackingScaling",  # 'StepScaling'|'TargetTrackingScaling'
    TargetTrackingScalingPolicyConfiguration={
        "TargetValue": 5.0,  
SageMakerVariantInvocationsPerInstance
        "CustomizedMetricSpecification": {
            "MetricName": "ApproximateBacklogSizePerInstance",
            "Namespace": "AWS/SageMaker",
            "Dimensions": [{"Name": "EndpointName", "Value": endpoint_name}],
            "Statistic": "Average",
        },
        "ScaleInCooldown": 600,
   ....
    },
)
```
clean up
```
response = client.deregister_scalable_target(
    ServiceNamespace='sagemaker',
    ResourceId='resource_id',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount'
)
```

Answer

You will using the regular autoscaling config outlined in the doc here to configure it for the SageMaker Async endpoint. There are no specifics for SageMaker.

First, you define the "aws_appautoscaling_target" with minimum and maximum capacities. Then go ahead and define your "TargetTrackingScaling" in the autoscaling policy

how to set up autoscaling for async sagemaker endpoint?

相關內容