What is the granularity of billing for inference on Bedrock fine-tuned models?

0

I'm building a small project which will use Llama 2 fine-tuning. It's likely to have very little inference usage as it's a proof of concept - maybe a few seconds per hour.

The billing page doesn't go into much detail about the inference costs for fine-tuned models. It says that it's $23.50 for 1 model unit per hour, but it doesn't mention the granularity of the billing.

If it's billed per second, awesome! If it's billed per hour it's probably not viable for a small project.

asked 3 months ago573 views
2 Answers
0

To add to Didier's response. Llama 2 customised models are available only in provisioned throughput after customisation. As at today, you can either commit to 1 month or 6 months (I'm sure you can do longer if you get in touch with the AWS team). The price quoted on the pricing page is per hour.

So the estimate of monthly cost would be:

$ProvisionedCost (eg. $21.18) x 24 hours x (365 / 12) = $15,461.40. Plus, I assume $1.95 for the storage of the customised model.

You can view the cost of this yourself, from the AWS console by going to AWS Bedrock from the console and clicking Provisioned Throughput.

This isn't a serverless type pricing, you do commit to the server so to speak, as opposed to just paying for when it's in use.

I believe this cost is such because custom models on single tenant endpoints - I can't find any exact documentation on that however.

Didier, please correct me if I'm wrong.

JamesM
answered 3 months ago
-1

Hi,

From examples in the pricing page https://aws.amazon.com/bedrock/pricing/

Customization (fine-tuning) pricing
An application developer customizes the Llama 2 Pre-trained (70B) model using 1
000 tokens of data. After training, uses custom model provisioned throughput for
 one hour to evaluate the performance of the model. The fine-tuned model is stored 
for one month. After evaluation, the developer uses provisioned throughput (1mo commit) 
to host the customized model.

Monthly cost incurred for fine-tuning is: Fine tuning training ($0.00799 * 1000) + custom 
model storage per month ($1.95) + one hour of custom model inference ($23.50) = $33.44

Monthly cost incurred for provisioned throughput (1-mo commit) of custom model = $21.18

So, you'll have fixed monthly cost of $21.18 after fine-tuning is done

Best,

Didier

profile pictureAWS
EXPERT
answered 3 months ago
  • Sorry but that example (from the pricing page) doesn't answer the question about granularity. In that example, the developer has purchased 1 hour of provisioned throughput, but it doesn't say anything about how that is used or billed, and what granularity.

    To make it simpler: let's say my application has sporadic usage and makes a request once per hour for a day for 1 minute of inference. If this is billed at 1 hour granularity, I would be billed $564 ($23.50 * 24). If this is billed at 1 minute granularity, this would be $9.40 ($23.50 / 60 * 24).

    I also think the monthly cost of $21.18 in the pricing page example is incorrect, as it's charged per hour, and all other examples calculate this in the tens of thousands (e.g. Titan).

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions