9 releases
0.2.1 | Jul 27, 2024 |
---|---|
0.2.0 | Jul 22, 2024 |
0.1.6 | Aug 2, 2023 |
0.1.5 | Jul 16, 2023 |
0.1.1 | Dec 4, 2022 |
#114 in Data structures
29,476 downloads per month
Used in qfilter
80KB
1.5K
SLoC
Qfilter
Efficient bloom filter like data structure, based on the Rank Select Quotient Filter (RSQF).
This is a small and flexible general-purpose AMQ-Filter. It not only supports approximate membership testing like a bloom filter but also deletions, merging, resizing and serde serialization.
- High performance
- Supports removals
- Extremely compact, more so than comparable filters
- Can be created with a initial small capacity and grow as needed
- (De)Serializable with serde
- Portable Rust implementation
- Only verifiable usages of unsafe
This data structure is similar to a hash table that stores fingerprints in a very compact way. Fingerprints are similar to a hash values, but are possibly truncated. The reason for false positives is that multiple items can map to the same fingerprint. For more information see the quotient filter Wikipedia page that describes a similar but less optimized version of the data structure. The actual implementation is based on the Rank Select Quotient Filter (RSQF).
The public API also exposes a fingerprint API, which can be used to succinctly store u64 hash values.
Example
let mut f = qfilter::Filter::new(1000000, 0.01);
for i in 0..1000 {
f.insert(i).unwrap();
}
for i in 0..1000 {
assert!(f.contains(i));
}
Hasher
The hashing algorithm used is xxhash3 which offers both high performance and stability across platforms.
Filter size
For a given capacity and error probability the RSQF may require significantly less space than the equivalent bloom filter or other AMQ-Filters.
Bits per item | Error probability when full | Bits per item (Cont.) | Error (cont.) |
---|---|---|---|
3.125 | 0.362 | 19.125 | 6.87e-06 |
4.125 | 0.201 | 20.125 | 3.43e-06 |
5.125 | 0.106 | 21.125 | 1.72e-06 |
6.125 | 0.0547 | 22.125 | 8.58e-07 |
7.125 | 0.0277 | 23.125 | 4.29e-07 |
8.125 | 0.014 | 24.125 | 2.15e-07 |
9.125 | 0.00701 | 25.125 | 1.07e-07 |
10.125 | 0.00351 | 26.125 | 5.36e-08 |
11.125 | 0.00176 | 27.125 | 2.68e-08 |
12.125 | 0.000879 | 28.125 | 1.34e-08 |
13.125 | 0.000439 | 29.125 | 6.71e-09 |
14.125 | 0.00022 | 30.125 | 3.35e-09 |
15.125 | 0.00011 | 31.125 | 1.68e-09 |
16.125 | 5.49e-05 | 32.125 | 8.38e-10 |
17.125 | 2.75e-05 | .. | .. |
18.125 | 1.37e-05 | .. | .. |
Compatibility between versions 0.1 and 0.2
Version 0.2 changed public APIs (e.g. fallible constructors) which required a major version bump.
Serialization is bidirectionally compatible between versions 0.1 and 0.2.
Not implemented
- Fingerprint attached values
- Counting with fingerprint values, not fingerprint duplication
- More advanced growth strategies (InfiniFilter).
Legacy x86_64 CPUs support
The implementation assumes the popcnt
instruction (equivalent to integer.count_ones()
) is present
when compiling for x86_64 targets. This is theoretically not guaranteed as the instruction is only
available on AMD/Intel CPUs released after 2007/2008. If that's not the case the Filter constructor will panic.
Support for such legacy x86_64 CPUs can be optionally enabled with the legacy_x86_64_support
which incurs a ~10% performance penalty.
License
This project is licensed under the MIT license.
Dependencies
~100–400KB