1 unstable release

0.1.0 Jan 2, 2025

#1192 in Machine learning

Download history 150/week @ 2024-12-29 46/week @ 2025-01-05 62/week @ 2025-01-12 92/week @ 2025-01-19 14/week @ 2025-01-26 262/week @ 2025-02-02 131/week @ 2025-02-09 110/week @ 2025-02-16 10/week @ 2025-02-23 65/week @ 2025-03-02 19/week @ 2025-03-09 446/week @ 2025-03-23

530 downloads per month

Apache-2.0

16KB
438 lines

Tocken

CI crates.io docs.rs

Tokenizer implemented in Rust.

This tokenizer is based on Lucene's EnglishAnalyzer.

Usage

  • as a library: check the main.rs file and docs.
  • as a CLI:
    • cargo r -r --help
    • cargo r -r -- -i wiki.txt -o wiki_tocken_f10.json -f 10

Dependencies

~9.5MB
~168K SLoC