1 Antwort
- Neueste
- Die meisten Stimmen
- Die meisten Kommentare
0
Hi User,
Real-time inference cost can be broken down into 2 components:
- Per Hour charges of your instance
- Data in/out per GB
In your case, you would be charged based on the per hour pricing of ml.c7g.16xlarge instance, and (2mb+2mb)*2mil for data in/out a month. Link to pricing examples can be found here.
If your usage will be consistent for a period of time, do check out savings plan to save cost.
Hope it helps!
Relevanter Inhalt
- AWS OFFICIALAktualisiert vor 2 Jahren
- AWS OFFICIALAktualisiert vor 2 Jahren
- AWS OFFICIALAktualisiert vor einem Jahr
Got it. Thank You! Would you please help me with what is the limit when I would need more than one instance in my case? Would it be if the instance gets more than 64 requests at a time?
Hi User,
You may define your own Auto-Scaling rules and apply that to your hosted model. Some rule examples are like: Invocations per instance, CPU utilization per instance. This will help scale up and down the number of instances you have.
Reference Link: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html