2 releases
Uses new Rust 2024
new 0.0.2 | Apr 24, 2025 |
---|---|
0.0.1 | Apr 14, 2025 |
#305 in Text processing
110 downloads per month
245KB
1K
SLoC
🦛 ChonkieR 🦀✨
The no-nonsense, lightweight and fast chunking library that's ready to CHONK your text, in Rust 🦀!
Installation • Usage • Chunkers • Acknowledgements • Citation
Chonkie just got low-leveled! 🦀 Your favorite python chunking library is now in Rust~ even faster, smaller and reliable than ever!
🦀 Rusty & Reliable: Built with Rust for memory safety and performance.
🚀 Feature-rich: All the CHONKs you'd ever need
✨ Easy to use: Add Crate, Use Crate, CHONK
⚡ Blazingly Fast: CHONK at the speed of Rust! zooooom
🪶 Light-weight: No bloat, just CHONK
🦛 Cute CHONK mascot: psst it's a pygmy hippo btw
❤️ Moto Moto's favorite Rust library
ChonkieR is a chunking library that "just works" ✨
Installation
To add ChonkieR
to your project, run:
cargo add chonkier # Or add it to your Cargo.toml
ChonkieR
follows the rule of minimum dependencies. Features can be enabled via Cargo features.
Don't want to think about it? Simply enable all
features (Not recommended for production binaries unless needed)
# Cargo.toml
[dependencies]
chonkier = { version = "0.1.0", features = ["all"] } # Replace with desired version
Usage
Here's a basic example to get you started:
use chonkier::CharacterTokenizer;
use chonkier::RecursiveChunker;
use chonkier::types::RecursiveRules;
fn main() {
// Initialize the chunker
let chunker = RecursiveChunker::new(CharacterTokenizer::new(), 512, RecursiveRules::default());
// Chunk some text
let text = "ChonkieR is the goodest boi! My favorite chunking hippo hehe.";
let chunks: Vec<RecursiveChunk> = chunker.chunk(text);
// Access chunks
for chunk in chunks {
println!("Chunk: {}", chunk.text);
println!("Tokens: {}", chunk.token_count);
}
}
Check out more usage examples in the examples folder!
Chunkers
ChonkieR currently supports the following chunkers:
- TokenChunker: Split text into fixed-size token chunks.
- SentenceChunker: Split text into chunks based on sentence boundaries.
- RecursiveChunker: Recursively split the text into chunks based on the rules provided.
Acknowledgements
ChonkieR would like to CHONK its way through a special thanks to all the users and contributors who have helped make this library what it is today! Your feedback, issue reports, and improvements have helped make ChonkieR the CHONKIEST it can be.
And of course, special thanks to Moto Moto for endorsing ChonkieR with his famous quote:
"I like them big, I like them chonkieR." ~ Moto Moto (He really said this)
Citation
If you use ChonkieR in your research, please cite it as follows:
@software{chonkie2025,
author = {Minhas, Bhavnick AND Nigam, Shreyash},
title = {Chonkie: A no-nonsense fast, lightweight, and efficient text chunking library},
year = {2025},
publisher = {GitHub},
howpublished = {\url{https://github.com/chonkie-inc/chonkie}},
}
Dependencies
~2–15MB
~158K SLoC