- Newest
- Most votes
- Most comments
To add to Didier's response. Llama 2 customised models are available only in provisioned throughput after customisation. As at today, you can either commit to 1 month or 6 months (I'm sure you can do longer if you get in touch with the AWS team). The price quoted on the pricing page is per hour.
So the estimate of monthly cost would be:
$ProvisionedCost (eg. $21.18) x 24 hours x (365 / 12) = $15,461.40. Plus, I assume $1.95 for the storage of the customised model.
You can view the cost of this yourself, from the AWS console by going to AWS Bedrock from the console and clicking Provisioned Throughput.
This isn't a serverless type pricing, you do commit to the server so to speak, as opposed to just paying for when it's in use.
I believe this cost is such because custom models on single tenant endpoints - I can't find any exact documentation on that however.
Didier, please correct me if I'm wrong.
Hi,
From examples in the pricing page https://aws.amazon.com/bedrock/pricing/
Customization (fine-tuning) pricing
An application developer customizes the Llama 2 Pre-trained (70B) model using 1
000 tokens of data. After training, uses custom model provisioned throughput for
one hour to evaluate the performance of the model. The fine-tuned model is stored
for one month. After evaluation, the developer uses provisioned throughput (1mo commit)
to host the customized model.
Monthly cost incurred for fine-tuning is: Fine tuning training ($0.00799 * 1000) + custom
model storage per month ($1.95) + one hour of custom model inference ($23.50) = $33.44
Monthly cost incurred for provisioned throughput (1-mo commit) of custom model = $21.18
So, you'll have fixed monthly cost of $21.18 after fine-tuning is done
Best,
Didier
Relevant content
- asked 8 months ago
- Accepted Answerasked 4 months ago
- asked 8 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
Sorry but that example (from the pricing page) doesn't answer the question about granularity. In that example, the developer has purchased 1 hour of provisioned throughput, but it doesn't say anything about how that is used or billed, and what granularity.
To make it simpler: let's say my application has sporadic usage and makes a request once per hour for a day for 1 minute of inference. If this is billed at 1 hour granularity, I would be billed $564 ($23.50 * 24). If this is billed at 1 minute granularity, this would be $9.40 ($23.50 / 60 * 24).
I also think the monthly cost of $21.18 in the pricing page example is incorrect, as it's charged per hour, and all other examples calculate this in the tens of thousands (e.g. Titan).