1 Answer
- Newest
- Most votes
- Most comments
2
You dont pay any compute costs for the duration when the endpoint size scales down to 0. But i think you can design it better. There are few other options for you to use in SageMaker Endpoint(assuming you are using realtime endpoint)
- Try using SageMaker Serverless Inference instead. Its purely serverless in nature so you pay only when the endpoint is serving inference. i think that would fit your requirement better.
- You can think of using Lambda as well which will reduce your hosting costs. but you have to do more work in setting up the inference stack all by yourself.
- There is also an option of SageMaker asynchronous inference but its mostly useful for inference which require longer time to process each request. The reason i mention this is it also support scale to 0 when no traffic is coming.
Relevant content
- asked 2 years ago
- Accepted Answer
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated a year ago
Thanks for your answer! As I understand it I could also use "SageMaker Batch Transform inference" (given I have the inputs saved in s3 bucket), and that will save my predictions automatically to a s3 output bucket. Do you think that interference type could also be useful for this use case?