1 Risposta
- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
2
You dont pay any compute costs for the duration when the endpoint size scales down to 0. But i think you can design it better. There are few other options for you to use in SageMaker Endpoint(assuming you are using realtime endpoint)
- Try using SageMaker Serverless Inference instead. Its purely serverless in nature so you pay only when the endpoint is serving inference. i think that would fit your requirement better.
- You can think of using Lambda as well which will reduce your hosting costs. but you have to do more work in setting up the inference stack all by yourself.
- There is also an option of SageMaker asynchronous inference but its mostly useful for inference which require longer time to process each request. The reason i mention this is it also support scale to 0 when no traffic is coming.
Contenuto pertinente
- AWS UFFICIALEAggiornata un anno fa
- Perché il mio endpoint Amazon SageMaker entra in stato di errore quando creo o aggiorno un endpoint?AWS UFFICIALEAggiornata un anno fa
- AWS UFFICIALEAggiornata 3 anni fa
- AWS UFFICIALEAggiornata un anno fa
Thanks for your answer! As I understand it I could also use "SageMaker Batch Transform inference" (given I have the inputs saved in s3 bucket), and that will save my predictions automatically to a s3 output bucket. Do you think that interference type could also be useful for this use case?