1 unstable release
0.0.1 | Oct 19, 2023 |
---|
#9 in #bpe
6KB
59 lines
💥 fastok
BPE in Rust with bindings to Python using PyO3Development
maturin develop
Python bindings
>>> from fastok import PreTokenizer
>>> pre_tokenizer = PreTokenizer(model="gpt2")
>>> pre_tokenizer.pre_tokenize_str("My name is Alvaro and I live in Barcelona.")
['My', ' name', ' is', ' Alvaro', ' and', ' I', ' live', ' in', ' Barcelona', '.']
>>> pre_tokenizer.pre_tokenize(["My name is Alvaro and I live in Barcelona.", "I like pizza."])
[['My', ' name', ' is', ' Alvaro', ' and', ' I', ' live', ' in', ' Barcelona', '.'], ['I', ' like', ' pizza', '.']]
Dependencies
~5–11MB
~107K SLoC