What tokenizer do the Titan Text models use?

0

I would like to calculate the number of tokens before sending a prompt to the Titan Text LMM service. What tokenizer do the Titan Text models use, and how can I run it locally?

Ben
질문됨 3달 전749회 조회
2개 답변
0

I am not finding it published in documentation, but if you ask Amazon Q in the AWS console, you get responses referencing WordPiece and SentencePiece tokenizers.

profile pictureAWS
전문가
iBehr
답변함 3달 전
  • Thanks for searching, I haven't accepted the answer yet as I would ideally like to know the exact tokenizer. However, WordPiece and SentencePiece might be able to be used as estimates

0

To calculate the number of tokens for a given prompt before sending it to a Titan Text-like Large Language Model (LMM) service, you would typically need to use the same tokenizer or a very similar one. If the exact tokenizer used by Titan Text is publicly available, you can run it locally by installing the necessary library and using the tokenizer to process your text.

profile picture
전문가
답변함 3달 전
  • Yes, but you've just added extra context to the question and not really answered it I'm afraid. I'm looking for the exact tokenizer. Since the use of the LLMs is partly charged per input token I would expect to have the ability to calculate this before sending the prompt

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠