#content-defined #slice #u8 #salt #window #values #chunker

quickcdc

A fast content defined chunker for u8 slices

1 stable release

1.0.0 Dec 17, 2018

#8 in #content-defined

MIT/Apache

10KB
114 lines

quickcdc

Summary

quickcdc is a fast content defined chunker for &[u8] slices.

  • For some background information, see AE: An Asymmetric Extremum Content Defined Chunking Algorithm by Yucheng Zhang.
  • Modification(s):
    • User may provide salt, introducing entropy / cutpoint variation (i.e. files re-processed with different salt values will produce different cutpoints).
    • Warp forward (reduced window size), skipping some unnecessary processing that happens before minimum chunk size is reached.

This should be faster than many CDC algorithms (anecdotal performance: 2GB/s on an amd1950x with an NVMe drive), but faster alternatives exist.

  • For more information, see FastCDC

NOTE: This implementation performs much faster when built with --release.

Example

use quickcdc;
use rand::Rng;

let mut rng = rand::thread_rng();
let mut sample = [0u8; 1024];
rng.fill(&mut sample[..]);
let target_size = 64;
let max_chunksize = 128;
let salt = 15222894464462204665;

let chunker = quickcdc::Chunker::with_params(&sample[..], target_size, max_chunksize, salt).unwrap();
for x in chunker {
    println!("{}", x.len());
}

Dependencies

~580–800KB
~11K SLoC