1 Answer
- Newest
- Most votes
- Most comments
0
Hi, Sagemaker Serverless Inference proposes optimal costs for low traffic due to its serverless nature: https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html
For Serverless Inference with Provisioned Concurrency, you pay for the compute
capacity used to process inference requests, billed by the millisecond, and the
amount of data processed. You also pay for Provisioned Concurrency usage,
based on the memory configured, duration provisioned, and the amount of
concurrency enabled.
Pricing is detailled here: https://aws.amazon.com/sagemaker/pricing/
Hope it helps!
Didier
Relevant content
- asked a year ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 2 years ago