#cpp #cfg #esaxx-rs #esaxx

esaxx-rs

Wrapping around sentencepiece's esaxxx library

11 releases

0.1.10 Oct 5, 2023
0.1.9 Oct 5, 2023
0.1.8 Jun 6, 2022
0.1.7 Sep 2, 2021
0.1.1 Jun 6, 2020

#86 in Data structures

Download history 51658/week @ 2024-12-17 26124/week @ 2024-12-24 32351/week @ 2024-12-31 62988/week @ 2025-01-07 56844/week @ 2025-01-14 58438/week @ 2025-01-21 62701/week @ 2025-01-28 68995/week @ 2025-02-04 68342/week @ 2025-02-11 73144/week @ 2025-02-18 82001/week @ 2025-02-25 82912/week @ 2025-03-04 93814/week @ 2025-03-11 78072/week @ 2025-03-18 74688/week @ 2025-03-25 68502/week @ 2025-04-01

330,064 downloads per month
Used in 144 crates (3 directly)

Apache-2.0

175KB
857 lines

Small wrapper around sentencepiece's esaxx suffix array C++ library. Usage

#[cfg(feature="cpp")]
{
let string = "abracadabra";
let suffix = esaxx_rs::suffix(string).unwrap();
let chars: Vec<_> = string.chars().collect();
let mut iter = suffix.iter();
assert_eq!(iter.next().unwrap(), (&chars[..4], 2)); // abra
assert_eq!(iter.next(), Some((&chars[..1], 5))); // a
assert_eq!(iter.next(), Some((&chars[1..4], 2))); // bra
assert_eq!(iter.next(), Some((&chars[2..4], 2))); // ra
assert_eq!(iter.next(), Some((&chars[..0], 11))); // ''
assert_eq!(iter.next(), None);
}

The previous version uses unsafe optimized c++ code. There exists another implementation a bit slower (~2x slower) that uses safe rust. It's a bit slower because it uses usize (mostly 64bit) instead of i32 (32bit). But it does seems to fix a few OOB issues in the cpp version (which never seemed to cause real problems in tests but still.)

let string = "abracadabra";
let suffix = esaxx_rs::suffix_rs(string).unwrap();
let chars: Vec<_> = string.chars().collect();
let mut iter = suffix.iter();
assert_eq!(iter.next().unwrap(), (&chars[..4], 2)); // abra
assert_eq!(iter.next(), Some((&chars[..1], 5))); // a
assert_eq!(iter.next(), Some((&chars[1..4], 2))); // bra
assert_eq!(iter.next(), Some((&chars[2..4], 2))); // ra
assert_eq!(iter.next(), Some((&chars[..0], 11))); // ''
assert_eq!(iter.next(), None);

esaxx-rs

This code implements a fast suffix tree / suffix array.

This code is taken from sentencepiece and to be used by hugging face.

Small wrapper around sentencepiece's esaxx suffix array C++ library. Usage

let string = "abracadabra";
let suffix = esaxx_rs::suffix(string).unwrap();
let chars: Vec<_> = string.chars().collect();
let mut iter = suffix.iter();
assert_eq!(iter.next().unwrap(), (&chars[..4], 2)); // abra
assert_eq!(iter.next(), Some((&chars[..1], 5))); // a
assert_eq!(iter.next(), Some((&chars[1..4], 2))); // bra
assert_eq!(iter.next(), Some((&chars[2..4], 2))); // ra
assert_eq!(iter.next(), Some((&chars[..0], 11))); // ''
assert_eq!(iter.next(), None);

No runtime deps