#rsync #diff #rdiff

dach

A general delta encoded network difference algorithm

2 unstable releases

0.3.0 Oct 11, 2024
0.0.0 Oct 4, 2024

#890 in Algorithms

LGPL-3.0-or-later

105KB
2K SLoC

dach

dach is a difference algorithm. It works on general data, both binary and text.

dach is heavily inspired by rsync and it's rolling hash concept.

See the documentation for details on how to use dach.


lib.rs:

A difference library similar to rsync.

Performance

dach is performant enough for large files and files, exceeding 10MB (when using rolling hashes). Please compile with the release preset for 10X the performance.

Allocating data and keeping it in memory is very fast compared to hashing. In the future, Dach will support reading data bit by bit, greatly reducing the memory usage.

How-to & examples

Keep in mind this isn't guaranteed to give the exact same data. Please check the data with a secure hashing algorithm (e.g. SHA-3) to ensure consistency.

Sending the data is possible due to serde providing serialization and deserialization. This requires the cargo feature serde to be enabled. You serialize all the structs in this library to any format.

These examples should cover what rsync does.

Get a remote's data

To get someone else's data, we construct a Signature and send it. The remote calculates a Difference using Signature::diff. The remote sends back the Difference which we Difference::apply.

Push my data to remote

Request the remote's Signature. They calculate it and respond. We calculate a Signature::diff and send it to them. They Difference::apply it. Their data should now be equal to mine.

Get the difference of a local file

Gets a small diff to send to others, almost like how git works.

base_data is considered prior knowledge. target_data is the modified data.

The data segments can be any size. Performance should still be good.

let base_data = b"This is a document everyone has. It's about some new difference library.";
let target_data = b"This is a document only I have. It's about some new difference library.";

let mut signature = Signature::new(128);
signature.write(base_data);
let signature = signature.finish();

let diff = signature.diff(target_data);

// This is the small diff you could serialize with Serde and send.
let minified = diff.minify(8, base_data)
    .expect("This won't panic, as the data hasn't changed from calling the other functions.");

Future improvements

  • Rolling hash
  • Multi-threaded Signature::diff. There is no feasible way to implement this, as we look ahead and change which window we're looking at after each iteration. Now with rolling hash, the performance is great.
  • Support read/write
    • Support to diff a reader
    • Support to apply to a writer
    • Fetch API for apply to get data on demand.
      • This could slow things down dramatically.
    • Implement Write for HashBuilder.
  • Use SHA(1|256?) to verify integrity of data. Bundled with the Signature. The implementer should provide this.

Dependencies

~350–570KB