1 unstable release
0.1.0 | Feb 26, 2024 |
---|
#924 in Text processing
Used in 2 crates
7KB
103 lines
Duval algorithm for Lyndon factorization and smallest rotation computation
We provide implementation of Duval algorithm for Lyndon factorization and smallest rotation computation in C++, Rust and Python.
Our code for the Lyndon factorization is based on this implementation.
Our code for the smallest rotation computation is custom and highly optimized (for the C++ and Rust versions).
Usage
Pure Python
Just use the following code snippet (from duval_pure.py
)
def min_rotation(s):
N = len(s)
s += s
a = b = 0
while b < N:
for i in range(N - a):
sai = s[a + i]
sbi = s[b + i]
if sai < sbi or a + i == b:
if i:
b += i - 1
break
if sai > sbi:
a = b
break
b += 1
return a
Call C++ from Python
Requires pybind11 (pip install pybind11
).
Install with pip install ./duval-py
from duval import duval, min_rotation
s = "abracadabra"
print(duval(s))
print(min_rotation(s))
C++
Simply include duval-cpp/duval.hpp
.
#include <iostream>
#include <string>
#include "duval.hpp"
int main()
{
std::string s = "abracadabra";
for (auto &factor : duval(s))
std::cout << factor << std::endl;
std::cout << min_rotation(s) << std::endl;
}
Rust
Code is provided in duval-rs
.
Testing
We provide simple tests for the Python, C++ and Rust versions.
C++
cd duval-cpp && g++ --std=c++11 -O3 test.cpp -o test && ./test && cd ..
We do extended testing for the C++ version with random strings against a very naive implementation. The random strings are generated with fixed seeds to ensure reproducibility.
We also display the performance. On our laptop, computing min_rotation
for 10000 strings (half random and half random) of length 10000 takes 0.2s.
Python
python duval-py/test.py
We test both the pure Python and the C++ version and compare their performance. The Python version is about 150x slower than the C++ version.
Rust
cargo test --manifest-path=duval-rs/Cargo.toml --release -- --nocapture
The Rust version has about the same performance as the C++ version.
Dependencies
~315KB