4 releases
new 0.1.3 | Feb 18, 2025 |
---|---|
0.1.2 | Feb 6, 2025 |
0.1.1 | Feb 2, 2025 |
0.1.0 | Feb 2, 2025 |
#528 in Algorithms
424 downloads per month
Used in chunkfs
56KB
1.5K
SLoC
rust-chunking
Content Based Chunking algorithms implementation:
- RabinCDC (taken from zbox)
- Leap-based CDC
- Matrix generation code can be found in ef_matrix.rs
- UltraCDC
- SuperCDC
- SeqCDC
Simple code to test an algorithm is provided in filetest.rs.
Features
- Chunkers that work using
std::iter::Iterator
trait, giving out data about the source dataset in the form of chunks. - Chunker sizes can be customized on creation. Default size values are provided.
- Other parameters from corresponding papers can also be modified on chunker creation.
Usage
To use them in custom code, the algorithms can be accessed using the corresponding modules, e.g.
fn main() {
let data = vec![1; 1024 * 1024];
let sizes = SizeParams::new(4096, 8192, 16384);
let chunker = ultra::Chunker::new(&data, sizes);
for chunk in chunker {
println!("start: {}, length: {}", chunk.pos, chunk.len);
}
let default_leap = leap_based::Chunker::new(&data, SizeParams::leap_default());
for chunk in default_leap {
println!("start: {}, length: {}", chunk.pos, chunk.len);
}
}
Dependencies
~3MB
~42K SLoC