#chunking #gear #hash #cdc #performance

gearhash

Fast, SIMD-accelerated hash function for content-defined chunking

4 releases

0.1.3 Apr 12, 2020
0.1.2 Dec 13, 2019
0.1.1 Dec 13, 2019
0.1.0 Dec 9, 2019

#1857 in Algorithms

Download history 164/week @ 2024-07-22 38/week @ 2024-07-29 81/week @ 2024-08-05 42/week @ 2024-08-12 43/week @ 2024-08-19 34/week @ 2024-08-26 21/week @ 2024-09-02 214/week @ 2024-09-09 114/week @ 2024-09-16 146/week @ 2024-09-23 138/week @ 2024-09-30 133/week @ 2024-10-07 223/week @ 2024-10-14 234/week @ 2024-10-21 262/week @ 2024-10-28 54/week @ 2024-11-04

775 downloads per month

MIT/Apache

24KB
593 lines

gearhash

The GEAR hashing function is a fast, rolling hash function that is well suited for content defined chunking.

In particular, this function is used as a building block for the FastCDC algorithm.

The implementation provided in this crate consists of both a simple, scalar variant, as well as optimized versions for the SSE4.2 and AVX2 instruction sets.

Usage

use gearhash::Hasher;

// set up initial state
let mut chunks = vec![];
let mut offset = 0;

// create new hasher
let mut hasher = Hasher::default();

// loop through all matches, and push the corresponding chunks
while let Some(boundary) = hasher.next_match(&buf[offset..], MASK) {
    chunks.push(&buf[offset..offset + boundary]);
    offset += boundary;
}

// push final chunk
chunks.push(&buf[offset..]);

Fuzzing

To ensure memory safety of the unsafe SIMD code in this crate, we use cargo-fuzz.

You can find the fuzzing targets under fuzz/fuzz_targets, which can be run using cargo fuzz run <target>.

License

This project is licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Dependencies

~8KB