#bloom-filter #cuckoo-filter #hash-values #quotient-filter #rsqf #cqf

qfilter

Efficient bloom filter like datastructure, based on the Rank Select Quotient Filter (RSQF)

9 releases

0.2.1 Jul 27, 2024
0.2.0 Jul 22, 2024
0.1.6 Aug 2, 2023
0.1.5 Jul 16, 2023
0.1.1 Dec 4, 2022

#114 in Data structures

Download history 8446/week @ 2024-07-29 6631/week @ 2024-08-05 6682/week @ 2024-08-12 9188/week @ 2024-08-19 7320/week @ 2024-08-26 8998/week @ 2024-09-02 7466/week @ 2024-09-09 4887/week @ 2024-09-16 7746/week @ 2024-09-23 6521/week @ 2024-09-30 7374/week @ 2024-10-07 8932/week @ 2024-10-14 8978/week @ 2024-10-21 7761/week @ 2024-10-28 8052/week @ 2024-11-04 4508/week @ 2024-11-11

29,476 downloads per month
Used in qfilter

MIT license

80KB
1.5K SLoC

Qfilter

Efficient bloom filter like data structure, based on the Rank Select Quotient Filter (RSQF).

This is a small and flexible general-purpose AMQ-Filter. It not only supports approximate membership testing like a bloom filter but also deletions, merging, resizing and serde serialization.

  • High performance
  • Supports removals
  • Extremely compact, more so than comparable filters
  • Can be created with a initial small capacity and grow as needed
  • (De)Serializable with serde
  • Portable Rust implementation
  • Only verifiable usages of unsafe

This data structure is similar to a hash table that stores fingerprints in a very compact way. Fingerprints are similar to a hash values, but are possibly truncated. The reason for false positives is that multiple items can map to the same fingerprint. For more information see the quotient filter Wikipedia page that describes a similar but less optimized version of the data structure. The actual implementation is based on the Rank Select Quotient Filter (RSQF).

The public API also exposes a fingerprint API, which can be used to succinctly store u64 hash values.

Example

let mut f = qfilter::Filter::new(1000000, 0.01);
for i in 0..1000 {
    f.insert(i).unwrap();
}
for i in 0..1000 {
    assert!(f.contains(i));
}

Hasher

The hashing algorithm used is xxhash3 which offers both high performance and stability across platforms.

Filter size

For a given capacity and error probability the RSQF may require significantly less space than the equivalent bloom filter or other AMQ-Filters.

Bits per item Error probability when full Bits per item (Cont.) Error (cont.)
3.125 0.362 19.125 6.87e-06
4.125 0.201 20.125 3.43e-06
5.125 0.106 21.125 1.72e-06
6.125 0.0547 22.125 8.58e-07
7.125 0.0277 23.125 4.29e-07
8.125 0.014 24.125 2.15e-07
9.125 0.00701 25.125 1.07e-07
10.125 0.00351 26.125 5.36e-08
11.125 0.00176 27.125 2.68e-08
12.125 0.000879 28.125 1.34e-08
13.125 0.000439 29.125 6.71e-09
14.125 0.00022 30.125 3.35e-09
15.125 0.00011 31.125 1.68e-09
16.125 5.49e-05 32.125 8.38e-10
17.125 2.75e-05 .. ..
18.125 1.37e-05 .. ..

Compatibility between versions 0.1 and 0.2

Version 0.2 changed public APIs (e.g. fallible constructors) which required a major version bump.

Serialization is bidirectionally compatible between versions 0.1 and 0.2.

Not implemented

  • Fingerprint attached values
  • Counting with fingerprint values, not fingerprint duplication
  • More advanced growth strategies (InfiniFilter).

Legacy x86_64 CPUs support

The implementation assumes the popcnt instruction (equivalent to integer.count_ones()) is present when compiling for x86_64 targets. This is theoretically not guaranteed as the instruction is only available on AMD/Intel CPUs released after 2007/2008. If that's not the case the Filter constructor will panic.

Support for such legacy x86_64 CPUs can be optionally enabled with the legacy_x86_64_support which incurs a ~10% performance penalty.

License

This project is licensed under the MIT license.

Dependencies

~100–400KB