- Newest
- Most votes
- Most comments
Hello, you could for example define a CloudWatch alarm to monitor the endpoint Invocations metric (metrics details on this link), so that an amount over a given period under a specific threshold could send a message to an SNS topic. Then you can have a Lambda function automatically triggered to consume that message to perform a specific action on the endpoint
A couple of references from AWS documentation:
- Create a CloudWatch alarm based on a static threshold -https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ConsoleAlarms.html
- Using AWS Lambda with Amazon SNS - https://docs.aws.amazon.com/lambda/latest/dg/with-sns.html
I think this blog outlines a better way: https://aws.amazon.com/blogs/machine-learning/save-costs-by-automatically-shutting-down-idle-resources-within-amazon-sagemaker-studio/
This only applies to SageMaker studio resources (eg. KernelGateways), in this case, the user is talking about SageMaker Real-time endpoints
This use case is better suited for SageMaker Serverless Inference. Serverless Inference is ideal for workloads which have idle periods between traffic spurts and can tolerate cold starts.
You might also consider SageMaker Asynchronous Inference, which enables you to save on costs by autoscaling the instance count to zero when there are no requests to process, so you only pay when your endpoint is processing requests.
Relevant content
- asked 4 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 4 months ago
If that answer helps you, please mark that as Accepted Answer