- 新しい順
- 投票が多い順
- コメントが多い順
If I well understood your question, I think you may find the answer here [1]. To help you debug your endpoints you may check this [2]. And to monitor a serverless endpoint you may check this [3]
Resources :
[1] - https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html [2] - https://docs.aws.amazon.com/sagemaker/latest/dg/logging-cloudwatch.html [3] - https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints-monitoring.html
Hi,
the Sagemaker Serverless Inference Model pricing is well detailled at https://aws.amazon.com/sagemaker/pricing/
Amazon SageMaker Serverless Inference
Amazon SageMaker Serverless Inference enables you to deploy machine learning models
for inference without configuring or managing any of the underlying infrastructure.
You can either use on-demand Serverless Inference or add Provisioned Concurrency to your
endpoint for predictable performance.
With on-demand Serverless Inference, you only pay for the compute capacity used to process
inference requests, billed by the millisecond, and the amount of data processed. The compute charge depends on the memory configuration you choose.
This means that you need 4 input variables to your model and you have 1 parameter:
- parameter is the size of memory that you select: it depends on the size of your model -> cost/sec
- var1: number of inference in a period, defined by your business case
- var2: avg duration of inferences (coming from measurements in your initial tests)
- var 3: avg size of you prompt (if your model is an LLM)
- var 4: avg size of prompt completion.
So. cost will be cost/sec [of given param] x (var2 x var3) + 0.16 x (prompt + completion)
parameter will give you price per second of inference. var1 x var2 will give you duration of inferences. var3
If you go with more advanced Provisioned Concurrency: you have to replace cost/sec by (Provisioned Concurrency Usage Price per second + Inference Duration Price per second) for your selected memory size.
Best,
Didier
関連するコンテンツ
- AWS公式更新しました 2年前