- Newest
- Most votes
- Most comments
I am not finding it published in documentation, but if you ask Amazon Q in the AWS console, you get responses referencing WordPiece and SentencePiece tokenizers.
To calculate the number of tokens for a given prompt before sending it to a Titan Text-like Large Language Model (LMM) service, you would typically need to use the same tokenizer or a very similar one. If the exact tokenizer used by Titan Text is publicly available, you can run it locally by installing the necessary library and using the tokenizer to process your text.
Yes, but you've just added extra context to the question and not really answered it I'm afraid. I'm looking for the exact tokenizer. Since the use of the LLMs is partly charged per input token I would expect to have the ability to calculate this before sending the prompt
Relevant content
- asked 5 months ago
- asked 5 months ago
- asked 5 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
Thanks for searching, I haven't accepted the answer yet as I would ideally like to know the exact tokenizer. However, WordPiece and SentencePiece might be able to be used as estimates