- Newest
- Most votes
- Most comments
What type of endpoint are you using? Serverless Inference you are billed by the amount of time your provisioned infrastructure (memory size you have allocated) is up and running, after a certain amount of idle time these resources are automatically scaled down for you which is when you are not charged. For real-time endpoints as you have a dedicated instance behind the endpoint at all times you are billed on-demand.
Endpoint type - Real-time, instance type ml.m4.xlarge Sagemaker is used very rarely, probably once per week or even less, but I see continuous costs per day Do you think using SageMaker Serverless Inference should decrease the pricing? We need realtime response for most of models (less than 200ms), would SageMaker Serverless Inference meat such requirements?
Relevant content
- asked a year ago
- asked 8 months ago
- asked 8 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago