#diff-match-patch #diff #patch #match #text-synchronization

diff-match-patch-rs

A high-performance port of Myer's diff algorithm to perform the operations required for synchronizing plain text

5 unstable releases

new 0.3.1 Nov 7, 2024
0.3.0 Sep 17, 2024
0.2.1 Sep 2, 2024
0.2.0 Sep 2, 2024
0.1.0-beta.4 Aug 29, 2024

#206 in Algorithms

MIT/Apache

240KB
4.5K SLoC

diff-match-patch-rs: Efficient port of Google's diff-match-patch implemented in Rust

github crates.io docs.rs

A very fast, accurate and wasm ready port of Diff Match Patch in Rust. The diff implementation is based on Myers' diff algorithm.

Highlights of this crate

  • Exposes two modes of operating with diff-match-patch, a Efficient mode and Compat mode. While the Efficient mode squeezes out the max performance the Compat mode ensures compatibility across other libraries or implementations (rust or otherwise). According to Benchmarks, our slower Compat mode is still faster than other implementations in rust.
    • Efficient mode works on &[u8] and the generated diffs break compatibility with other implementation. Use the Efficient mode ONLY if you are using this crate at the source of diff generation and the destination.
    • Compat mode on the other hand works on &[char] and the generated diffs and patches are compatible across other implementations of diff-match-patch. Checkout tests/compat.rs for some test cases around this.
  • wasm ready, you can check out a demo here
  • Accurate, while working on this crate I've realized that there are a bunch of implementations that have major issues (wrong diffs, inaccurate flows, silent errors etc.).
  • Helper method pretty_html provided by this crate allows some configurations to control the generated visuals elements.
  • Well tested
  • Added a fuzzer for sanity
  • Exposes the same APIs as Diff Match Patch with minor changes to make it more idiomatic in Rust.

Usage Examples

[dependencies]
diff-match-patch-rs = "0.3.0"

Effitient mode

use diff_match_patch_rs::{DiffMatchPatch, Efficient, Error, PatchInput};

// This is the source text
const TXT_OLD: &str = "I am the very model of a modern Major-General, I've information on vegetable, animal, and mineral, ๐Ÿš€๐Ÿ‘๐Ÿ‘€";

// Let's assume this to be the text that was editted from the source text
const TXT_NEW: &str = "I am the very model of a cartoon individual, My animation's comical, unusual, and whimsical.๐Ÿ˜Š๐Ÿ‘€";

// An example of a function that creates a diff and returns a set of patches serialized
fn at_source() -> Result<String, Error> {
    // initializing the module
    let dmp = DiffMatchPatch::new();
    // create a list of diffs
    let diffs = dmp.diff_main::<Efficient>(TXT_OLD, TXT_NEW)?;
    // Now, we are going to create a list of `patches` to be applied to the old text to get the new text
    let patches = dmp.patch_make(PatchInput::new_diffs(&diffs))?;
    // in the real world you are going to transmit or store this diff serialized to undiff format to be consumed or used somewhere elese
    let patch_txt = dmp.patch_to_text(&patches);

    Ok(patch_txt)
}

fn at_destination(patches: &str) -> Result<(), Error> {
    // initializing the module
    let dmp = DiffMatchPatch::new();
    // lets recreate the diffs from patches
    let patches = dmp.patch_from_text::<Efficient>(patches)?;
    // Now, lets apply these patches to the `old_txt` which is the original to get the new text
    let (new_txt, ops) = dmp.patch_apply(&patches, TXT_OLD)?;
    // Lets print out if the ops succeeded or not
    ops.iter()
        .for_each(|&o| println!("{}", if o { "OK" } else { "FAIL" }));

    // If everything goes as per plan you should see
    // OK
    // OK
    // ... and so on

    // lets check out if our `NEW_TXT` (presumably the edited one)
    if new_txt != TXT_NEW {
        return Err(Error::InvalidInput);
    }

    println!("Wallah! Patch applied successfully!");

    Ok(())
}

fn main() -> Result<(), Error> {
    // At the source of diff where the old text is being edited we'll create a set of patches
    let patches = at_source()?;
    // We'll send this diff to some destination e.g. db or the client where these changes are going to be applied
    // The destination will receive the patch string and will apply the patches to recreate the edits
    at_destination(&patches)
}

Compat mode

use diff_match_patch_rs::{DiffMatchPatch, Compat, Error, PatchInput};

// This is the source text
const TXT_OLD: &str = "I am the very model of a modern Major-General, I've information on vegetable, animal, and mineral, ๐Ÿš€๐Ÿ‘๐Ÿ‘€";

// Let's assume this to be the text that was editted from the source text
const TXT_NEW: &str = "I am the very model of a cartoon individual, My animation's comical, unusual, and whimsical.๐Ÿ˜Š๐Ÿ‘€";

// An example of a function that creates a diff and returns a set of patches serialized
fn at_source() -> Result<String, Error> {
    // initializing the module
    let dmp = DiffMatchPatch::new();
    // create a list of diffs
    let diffs = dmp.diff_main::<Compat>(TXT_OLD, TXT_NEW)?;
    // Now, we are going to create a list of `patches` to be applied to the old text to get the new text
    let patches = dmp.patch_make(PatchInput::new_diffs(&diffs))?;
    // in the real world you are going to transmit or store this diff serialized to undiff format to be consumed or used somewhere elese
    let patch_txt = dmp.patch_to_text(&patches);

    Ok(patch_txt)
}

fn at_destination(patches: &str) -> Result<(), Error> {
    // initializing the module
    let dmp = DiffMatchPatch::new();
    // lets recreate the diffs from patches
    let patches = dmp.patch_from_text::<Compat>(patches)?;
    // Now, lets apply these patches to the `old_txt` which is the original to get the new text
    let (new_txt, ops) = dmp.patch_apply(&patches, TXT_OLD)?;
    // Lets print out if the ops succeeded or not
    ops.iter()
        .for_each(|&o| println!("{}", if o { "OK" } else { "FAIL" }));

    // If everything goes as per plan you should see
    // OK
    // OK
    // ... and so on

    // lets check out if our `NEW_TXT` (presumably the edited one)
    if new_txt != TXT_NEW {
        return Err(Error::InvalidInput);
    }

    println!("Wallah! Patch applied successfully!");

    Ok(())
}

fn main() -> Result<(), Error> {
    // At the source of diff where the old text is being edited we'll create a set of patches
    let patches = at_source()?;
    // We'll send this diff to some destination e.g. db or the client where these changes are going to be applied
    // The destination will receive the patch string and will apply the patches to recreate the edits
    at_destination(&patches)
}

Match - fuzzy match of pattern in Text

use diff_match_patch_rs::{DiffMatchPatch, Compat, Error, PatchInput};

// This is the source text
const TXT: &str = "I am the very model of a modern Major-General, I've information on vegetable, animal, and mineral, ๐Ÿš€๐Ÿ‘๐Ÿ‘€";

// The patter we are trying to fing
const PATTERN: &str = " that berry ";

// Returns `location` of match if found, `None` if not found
fn main() -> Option<usize> {
    let dmp = DiffMatchPatch::new();
    
    // works with both `Efficient` and `Compat` modes
    // `5` here is an approx location to find `nearby` matches
    dmp.match_main::<Efficient>(TXT, PATTERN, 5) // this should return Some(4)
}

Note

The Efficient and Compat mode APIs are identical with the only chage being the generic parameter declared during the calls.

E.g. we initiated a diff in the Efficient mode with dmp.diff_main::<Efficient>( ... ) while for Compat mode we did dmp.diff_main::<Compat>( ... ).

Please checkout the examples directory of the source repo for a few common use-cases.

The `Effitient` and `Compat` modes are mutually exclusive and will not generate correct output if used interchangibly at source and destination

Benchmarks

Benchmarks are maintained diff-match-patch-bench repository

Lang. Library Diff Avg. Patch Avg. Bencher Mode Correct
rust diff_match_patch v0.1.1[^2] 68.108 ms 10.596 ms Criterion - โœ…
rust dmp v0.2.0 69.019 ms 14.654 ms Criterion - โœ…
rust diff-match-patch-rsour 64.66 ms 631.13 ยตs Criterion Efficient โœ…
rust diff-match-patch-rsour 64.68 ms 1.1703 ms Criterion Compat โœ…
go go-diff 50.31 ms 135.2 ms go test - โœ…
node diff-match-patch[^1] 246.90 ms 1.07 ms tinybench - โŒ
python diff-match-patch 1.01 s 0.25 ms timeit - โœ…

[^1]: diff-match-patch generated patch text and delta breaks on unicode surrogates. [^2]: Adds an extra clone to the iterator because the patch_apply method takes mutable refc. to diffs.

Gotchas

Diff incompatibility with JavaScript libs:

There are 2 kinds of implementations - one which use a postprocessing function for merging unicode surrogates which break compatibility with every other popular diff-match-patch implementations and the other kind (packages based on the original implementation) break while urlEncode() of unicode surrogates. As of now, this crate brakes compatibility while working with JS generated diffs with the surrogate patch. If you are interfacing with JavaScript in browser, using this crate through wasm would be ideal.

Diff Match Patch was originally built in 2006 to power Google Docs.

Dependencies

~10โ€“290KB