#sqlite-extension #sqlite #tokenizer #extension #temporary-files #fts5

libsimple

Rust bindings to simple, a SQLite3 fts5 tokenizer which supports Chinese and PinYin

12 releases

new 0.4.0 Mar 6, 2025
0.3.6 Jan 25, 2025
0.3.4 Oct 6, 2024
0.3.1 Jul 25, 2024

#686 in Database interfaces

Download history 31/week @ 2024-11-30 193/week @ 2024-12-07 27/week @ 2024-12-14 10/week @ 2024-12-21 4/week @ 2024-12-28 11/week @ 2025-01-04 5/week @ 2025-01-11 105/week @ 2025-01-18 116/week @ 2025-01-25 25/week @ 2025-02-01 4/week @ 2025-02-08 1/week @ 2025-02-15 160/week @ 2025-03-01

167 downloads per month

MIT license

13MB
180K SLoC

C 178K SLoC // 0.2% comments C++ 1K SLoC // 0.1% comments Rust 118 SLoC

libsimple

Crate GitHub last commit GitHub issues GitHub pull requests GitHub

Description

Rust bindings to simple, a SQLite3 fts5 tokenizer which supports Chinese and PinYin.

Usage

Add this to your Cargo.toml:

[dependencies]
libsimple = "~0.3"

Example

use anyhow::Result;
use tempfile::tempdir;

fn main() -> Result<()> {
    libsimple::enable_auto_extension()?;
    let dir = tempdir()?;
    libsimple::release_dict(&dir)?;
    
    let conn = rusqlite::Connection::open_in_memory()?;
    libsimple::set_dict(&conn, &dir)?;
    
    conn.execute_batch("
        CREATE VIRTUAL TABLE d USING fts5(id, text, tokenize = 'simple');
        INSERT INTO d (id, text) VALUES (1, '中华人民共和国国歌');
        INSERT INTO d (id, text) VALUES (2, '周杰伦');
    ")?;
    assert_eq!(1, conn.query_row(
        "SELECT id FROM d WHERE text MATCH jieba_query('中华国歌')",
        [], |row| row.get::<_, i64>(0)
    )?);
    assert_eq!(2, conn.query_row(
        "SELECT id FROM d WHERE text MATCH simple_query('zhoujiel')",
        [], |row| row.get::<_, i64>(0)
    )?);
    Ok(())
}

License

Licensed under MIT license (LICENSE or http://opensource.org/licenses/MIT)

Version map

This is the compatible version map between libsimple and rusqlite:

libsimple version rusqlite version
=0.4.0 ~0.34
=0.3.7 ~0.34
=0.3.6 ~0.33
=0.3.5 ~0.33
=0.3.4 ~0.32
=0.3.3 ~0.32
=0.3.2 ~0.32
=0.3.1 ~0.32
=0.3.0 ~0.31
=0.2.2 ~0.31
=0.2.1 ~0.31
=0.2.0 ~0.31
=0.1.0 ~0.31

Generate CMRC

This is only required when the pinyin.txt updated. Normal user can ignore this.

cd simple && mkdir build && cd build
cmake .. -DBUILD_SQLITE3=off -DSIMPLE_WITH_JIEBA=off -DBUILD_TEST_EXAMPLE=off
make
cp -f _cmrc/include/cmrc/cmrc.hpp ../../cmrc/include/cmrc/cmrc.hpp
cp -f __cmrc_PINYIN_TEXT/lib.cpp ../../cmrc/pinyin.txt/lib.cpp
cp -f __cmrc_PINYIN_TEXT/intermediate/contrib/pinyin.txt.cpp ../../cmrc/pinyin.txt/pinyin.txt.cpp
cd .. && rm -r build && cd ..

Dependencies

~22MB
~424K SLoC