11 unstable releases (3 breaking)
0.4.4 | Oct 6, 2019 |
---|---|
0.4.3 | Oct 6, 2019 |
0.3.0 | Oct 6, 2019 |
0.2.4 | Oct 6, 2019 |
0.1.0 | Oct 5, 2019 |
#967 in Text processing
5,483 downloads per month
Used in 7 crates
(5 directly)
13KB
216 lines
trigram
This Rust crate contains functions for fuzzy string matching.
It exports two functions. The similarity
function returns the similarity of
two strings, and the find_words_iter
function returns an iterator of matches
for a smaller string (needle
) in a larger string (haystack
).
The similarity of strings is computed based on their trigrams, meaning their 3-character substrings: https://en.wikipedia.org/wiki/Trigram.
Trying it out
Here is how to run the examples:
$ cargo run --example similarity color colour
...
0.44444445
$ cargo run --example find_words_iter
bufalo
buffalow
Bungalo
biffalo
buffaloo
huffalo
snuffalo
fluffalo
Usage
Add this to your Cargo.toml
:
[dependencies]
trigram = "0.2.2"
and call it like this:
use trigram::similarity;
fn main() {
println!("{}", similarity(&"rustacean", &"crustacean"));
}
Background
The similarity
function in this crate is a reverse-engineered approximation
of the similarity
function in the Postgresql pg_trgm extension:
https://www.postgresql.org/docs/9.1/pgtrgm.html. It gives exactly the same
answers in many cases, but may disagree in others (none known). If you find a
case where the answers don't match, please file an issue about it!
A good introduction to the Postgres version of this is given on Stack Overflow: https://stackoverflow.com/a/43161051/484529.
Dependencies
~2.2–3MB
~54K SLoC