#openai #bpe #models #python #original #tokeniser #tiktoken

tiktoken-rust

a fast BPE tokeniser for use with OpenAI's models

2 releases

0.2.1 Jul 31, 2023
0.2.0 May 4, 2023
0.1.0 Apr 28, 2023

#975 in Machine learning

MIT license

57KB
1K SLoC

tiktoken-rust

STATUS: Under development.

tiktoken is a fast BPE tokeniser for use with OpenAI's models. It provides Python interface to interact with it.

This project is a fork of original repo, bring the capability to rust world.

use tiktoken_rust as tt;

let enc = tt::get_encoding("cl100k_base").unwrap();

assert_eq!(
    "hello world",
    enc.decode(&enc.encode_ordinary("hello world"), tt::DecodeMode::Strict).unwrap()
)

lib.rs:

tiktoken_rust

This crate is a tokeniser for use with OpenAI's models.

Dependencies

~10–23MB
~340K SLoC