#sketch #streaming #item #top-k #sketch-data-structure

bin+lib heavykeeper

HeavyKeeper is for finding Top-K elephant flows with high precision and low memory footprint

8 releases

new 0.3.0 Mar 30, 2025
0.2.7 Mar 15, 2025
0.2.6 Feb 11, 2025
0.2.4 Nov 30, 2024
0.2.2 Jun 19, 2024

#268 in Data structures

Download history 14/week @ 2024-12-10 109/week @ 2025-02-04 113/week @ 2025-02-11 13/week @ 2025-02-18 41/week @ 2025-02-25 17/week @ 2025-03-04 156/week @ 2025-03-11 79/week @ 2025-03-18 105/week @ 2025-03-25

360 downloads per month

MIT/Apache

1MB
1K SLoC

heavykeeper-rs

Crates.io MIT / Apache 2.0 licensed Build Status

📖 Docs

Top-K Heavykeeper algorithm for Top-K elephant flows

This is based on the paper HeavyKeeper: An Accurate Algorithm for Finding Top-k Elephant Flows by Junzhi Gong, Tong Yang, Haowei Zhang, and Hao Li, Peking University; Steve Uhlig, Queen Mary, University of London; Shigang Chen, University of Florida; Lorna Uden, Staffordshire University; Xiaoming Li, Peking University

Example

See examples/basic.rs for a complete example, or examples/ip_files.rs for an example of counting top-k IP flows in a file.

Basic usage:

use heavykeeper::TopK;

// create a new TopK with k=10, width=1000, depth=4, decay=0.9
let mut topk: TopK<Vec<u8>> = TopK::new(10, 1000, 4, 0.9);

// add some items
topk.add(b"example item".to_vec());
topk.add(b"another item".to_vec());

// check the counts
for node in topk.list() {
    println!("{} {}", String::from_utf8_lossy(&node.item), node.count);
}

Other Implementations

Name Language Github Repo
SegmentIO Go https://github.com/segmentio/topk
Aegis Go https://github.com/go-kratos/aegis/blob/main/topk/heavykeeper.go
Tomasz Kolaj Go https://github.com/migotom/heavykeeper
HeavyKeeper Paper C++ https://github.com/papergitkeeper/heavy-keeper-project
Jigsaw-Sketch C++ https://github.com/duyang92/jigsaw-sketch-paper/tree/main/CPU/HeavyKeeper
Redis Bloom Heavykeeper C https://github.com/RedisBloom/RedisBloom/blob/master/src/topk.c
Count-Min-Sketch Rust https://github.com/alecmocatta/streaming_algorithms
Sliding Window HeavyKeeper Go https://github.com/keilerkonzept/topk

Running

An example driver program which can be used as a word count program can be found at main.rs.

Usage:

cargo build --release
target/release/heavykeeper -k 10 -w 8192 -d 2 -y 0.95 -f data/war_and_peace.txt

Running the basic example

cargo run --example basic --release

Running the IPv4 example

cargo run --example ip_files --release

Run the benchmarks

cargo bench

Benchmark the sample word count app

hyperfine 'target/release/heavykeeper -k 10 -w 8192 -d 2 -y 0.95 -f data/war_and_peace.txt'

Test Data

For information about test data format and how to obtain or generate test data, please see data/README.md.

License

This project is dual licensed under the Apache/MIT license.

Dependencies

~2.5–3.5MB
~56K SLoC