Kitoken. Tokenize Everything!
7
Llama 3.3
Kit7850 7850oken1713 1713.13 13 Token9857 9857ize553 553 Everything20696 20696!0 0
Fast and versatile tokenizer for language models compatible with SentencePiece, Tokenizers, Tiktoken and more.
What is a Tokenizer?
A tokenizer turns language model inputs from text into a series
of numbers, called tokens, which a language model is trained to
understand.
There are many algorithms for this process -
BPE, Unigram and WordPiece
are the most popular and widespread.
All language models use a tokenizer for their text inputs, each with
a different set of available tokens.
Kitoken is a tokenizer for any model.