1개 답변
- 최신
- 최다 투표
- 가장 많은 댓글
0
Hi User,
Real-time inference cost can be broken down into 2 components:
- Per Hour charges of your instance
- Data in/out per GB
In your case, you would be charged based on the per hour pricing of ml.c7g.16xlarge instance, and (2mb+2mb)*2mil for data in/out a month. Link to pricing examples can be found here.
If your usage will be consistent for a period of time, do check out savings plan to save cost.
Hope it helps!
관련 콘텐츠
- AWS 공식업데이트됨 2년 전
- AWS 공식업데이트됨 8달 전
- AWS 공식업데이트됨 일 년 전
Got it. Thank You! Would you please help me with what is the limit when I would need more than one instance in my case? Would it be if the instance gets more than 64 requests at a time?
Hi User,
You may define your own Auto-Scaling rules and apply that to your hosted model. Some rule examples are like: Invocations per instance, CPU utilization per instance. This will help scale up and down the number of instances you have.
Reference Link: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html