4 releases (2 breaking)

0.3.0 Apr 12, 2024
0.2.0 Sep 1, 2023
0.1.1 Jun 23, 2023
0.1.0 Jun 9, 2023

#281 in Database implementations

Download history 120123/week @ 2024-07-02 117788/week @ 2024-07-09 114860/week @ 2024-07-16 124741/week @ 2024-07-23 121972/week @ 2024-07-30 99290/week @ 2024-08-06 85891/week @ 2024-08-13 88678/week @ 2024-08-20 75864/week @ 2024-08-27 84310/week @ 2024-09-03 74630/week @ 2024-09-10 76865/week @ 2024-09-17 78862/week @ 2024-09-24 82143/week @ 2024-10-01 72513/week @ 2024-10-08 70943/week @ 2024-10-15

317,025 downloads per month
Used in 54 crates (8 directly)

MIT license

7KB
117 lines

#Tokenizer-API

An API to interface a tokenizer with tantivy.

The API will be kept stable in order to not break support for existing tokenizers.


lib.rs:

Tokenizer are in charge of chopping text into a stream of tokens ready for indexing. This is an seperate crate from tantivy, so implementors don't need to update for each new tantivy version.

To add support for a tokenizer, implement the Tokenizer trait. Checkout the tantivy repo for some examples.

Dependencies

~0.4–1MB
~23K SLoC