Sagemaker Real Time Inference pricing clarification

0

Hello, Can you please help me to comprehend the pricing of Sagemaker Real time Inference? If I choose to go with instance ml.c7g.16xlarge(memory - 128 GiB, vCPUs - 64), suppose following is my payload expectations: for a single request, 2 MB is input payload size and 2 MB is output payload size. The expected requests for a month is 2 million requests. Say the deployed model size is 2GB and it is deployed for 24 hours, 7 days a week. If auto scaling has not been turned on, could you please tell me how much would be the charge for a month? And any restrictions on the number of requests?

1 Antwort
0

Hi User,

Real-time inference cost can be broken down into 2 components:

  1. Per Hour charges of your instance
  2. Data in/out per GB

In your case, you would be charged based on the per hour pricing of ml.c7g.16xlarge instance, and (2mb+2mb)*2mil for data in/out a month. Link to pricing examples can be found here.

If your usage will be consistent for a period of time, do check out savings plan to save cost.

Hope it helps!

profile pictureAWS
EXPERTE
ljunkai
beantwortet vor einem Jahr
  • Got it. Thank You! Would you please help me with what is the limit when I would need more than one instance in my case? Would it be if the instance gets more than 64 requests at a time?

  • Hi User,

    You may define your own Auto-Scaling rules and apply that to your hosted model. Some rule examples are like: Invocations per instance, CPU utilization per instance. This will help scale up and down the number of instances you have.

    Reference Link: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen