working with an example documented here -> https://github.com/aws/amazon-sagemaker-examples/blob/main/async-inference/Async-Inference-Walkthrough.ipynb. I was able to set up the sagemaker model, config and aync endpoint via lambda, now I'm trying to re-create the stack via terraform. based on the documentation on terraform, i was able to set up the model, config and the endpoint but couldn't find how to go about setting up the auto scaling ( sample code below).
is this possible?
client = boto3.client( "application-autoscaling")
resource_id = ( "endpoint/" + endpoint_name + "/variant/" + "variant1")
response = client.register_scalable_target(
ServiceNamespace="sagemaker",
ResourceId=resource_id,
ScalableDimension="sagemaker:variant:DesiredInstanceCount",
MinCapacity=0,
MaxCapacity=5,
)
response = client.put_scaling_policy(
PolicyName="Invocations-ScalingPolicy",
ServiceNamespace="sagemaker",
ResourceId=resource_id, # Endpoint name
ScalableDimension="sagemaker:variant:DesiredInstanceCount",
PolicyType="TargetTrackingScaling", # 'StepScaling'|'TargetTrackingScaling'
TargetTrackingScalingPolicyConfiguration={
"TargetValue": 5.0,
SageMakerVariantInvocationsPerInstance
"CustomizedMetricSpecification": {
"MetricName": "ApproximateBacklogSizePerInstance",
"Namespace": "AWS/SageMaker",
"Dimensions": [{"Name": "EndpointName", "Value": endpoint_name}],
"Statistic": "Average",
},
"ScaleInCooldown": 600,
....
},
)
clean up
response = client.deregister_scalable_target(
ServiceNamespace='sagemaker',
ResourceId='resource_id',
ScalableDimension='sagemaker:variant:DesiredInstanceCount'
)
Have you tried this yet? Did you get an error. This is the right approach.
@AWS-User-0823707 - yes. it works. I still have few more follow up questions regarding this. do you have any experience in this?