#levenshtein

weighted_levenshtein

Generic implementation of Levenshtein distance allowing arbitrary weighting of operations

2 unstable releases

0.2.0 Oct 3, 2020
0.1.0 Sep 20, 2020

#2000 in Algorithms

MIT license

15KB
299 lines

README

Docs

A generic implementation of the Levenshtein distance that allows arbitrarily weighting operations for different elements.

Generic

This crate can work on slices of any kind. It can:

  • Compute a distance in characters between two strings:
assert_eq!(distance("abc", "aaxcc"), 3);
  • Compute a distance in words between two strings:
assert_eq!(
   distance(
      "The quick brown fox".split (' ').collect::<Vec<_>>(),
      "The very quick brown cat".split (' ').collect()),
   2);
  • Or compute a distance between arbitrary sequences:
assert_eq!(distance(vec![1, 2, 3], vec![0, 1, 3, 3, 4]), 3);

Weighting

This crate allows defining custom weights for each operation on each symbol. These weights can be specified for custom types by implementing the EditWeight trait.

For example:

enum MyType {
  A,
  B,
}

impl EditWeight for MyType {
  fn add_cost(&self) -> usize {
    match *self {
      MyType::A => 1,
      MyType::B => 2,
    }
  }
  fn rm_cost(&self) -> usize {
    match *self {
      MyType::A => 1,
      MyType::B => 2,
    }
  }
  fn sub_cost(&self, other: &Self) -> usize {
    if self == other {
      0
    } else {
      3
    }
  }
}

assert_eq!(distance(vec![MyType::A], vec![MyType::B, MyType::B]), 5)

No runtime deps