4 stable releases
1.1.0 | Aug 8, 2020 |
---|---|
1.0.2 | Jun 6, 2020 |
1.0.1 | Aug 12, 2019 |
#1641 in Data structures
16KB
166 lines
Watermark
A simple watermarking set.
A watermarking set holds any integer values, and supports two operations:
- insert(element: T)
- Inserts an item into the set
- contains(element: T)
- Checks whether an item has previously been added to the set
A watermark set works best when the "inserts" all happen, and happen "mostly" in order. For example, when keeping track of which message IDs have been seen.
Example
To make a simple idempotent data processor:
let mut ws = watermark::WatermarkSet::default();
for message in message_bus {
if !ws.contains(message.id) {
ws.insert(message.id);
// Do some work with message.data
}
}
Operation
Internally, a watermarking set contains a "watermark" and a bitvector of "recently added" items. The watermark guarantees that all items below that number have been seen, and the recently added items handles everything else. This means that if all elements eventually get added, memory usage is kept very low and membership tests are very very cheap.
Performance
This crate comes with some simple benchmarks that you can run for yourself. Here are the results, compared to a hash set, that I got on my machine:
Specs:
- AMD Ryzen 7 3700x 8-Core Processor
- 2x16gb DDR4 3200
WatermarkSet Insert/In Order
time: [7.6194 us 7.8101 us 7.9793 us]
thrpt: [125.32 Melem/s 128.04 Melem/s 131.24 Melem/s]
HashSet Insert/In Order time: [40.319 us 40.340 us 40.372 us]
thrpt: [24.770 Melem/s 24.789 Melem/s 24.802 Melem/s]
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe
WatermarkSet Insert/Out of Order
time: [9.1897 us 9.4983 us 9.7602 us]
thrpt: [102.46 Melem/s 105.28 Melem/s 108.82 Melem/s]
HashSet Insert/Out of Order
time: [26.327 us 26.342 us 26.355 us]
thrpt: [37.943 Melem/s 37.962 Melem/s 37.983 Melem/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
WatermarkSet Contains/Aligned
time: [242.04 ns 242.25 ns 242.47 ns]
thrpt: [2.6396 Gelem/s 2.6419 Gelem/s 2.6442 Gelem/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
HashSet Contains/Aligned
time: [7.6920 us 7.7382 us 7.8144 us]
thrpt: [81.900 Melem/s 82.706 Melem/s 83.204 Melem/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
WatermarkSet Contains/Unaligned
time: [290.21 ns 290.39 ns 290.56 ns]
thrpt: [2.2026 Gelem/s 2.2039 Gelem/s 2.2053 Gelem/s]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
HashSet Contains/Unaligned
time: [7.9000 us 7.9021 us 7.9043 us]
thrpt: [80.969 Melem/s 80.991 Melem/s 81.012 Melem/s]
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe
Dependencies
~475KB