#random-access #set #integer #bounded #performance #unsigned-integer

bin+lib fastset

Fast set implementation for dense, bounded integer collections, optimized for quick updates and access

8 unstable releases (3 breaking)

0.4.1 Apr 5, 2024
0.4.0 Apr 1, 2024
0.3.0 Mar 23, 2024
0.2.1 Mar 21, 2024
0.1.2 Mar 13, 2024

#721 in Data structures

Download history 19/week @ 2024-07-23 16/week @ 2024-07-30 2/week @ 2024-09-17 13/week @ 2024-09-24

555 downloads per month
Used in signvec

MIT license

125KB
2K SLoC

fastset

Crates.io docs.rs License GitHub Workflow Status

Fast set implementation for dense, bounded integer collections, offering quick updates and random access.

Features

  • Tailored for unsigned integer (usize) elements, ideal for index-based applications
  • Fast insertion, removal, and membership check
  • random method for uniform random sampling
  • Paging mechanism to somewhat mitigate the large memory footprint[^1]

Note that while paging improves the existing memory footprint, fastset::Set is still not a good solution for memory constrained applications or for applications with storage need for sparse elements spread over an extended range. For integers twice as sparse as the page size, the fastset::Set with paging has peak heap allocation ~ 8x that of std::collections::HashSet.

[^1]: A paging mechanism is introduced in 0.4.0 that reduces the memory-footprint of fastset::Set. With the paging feature, fastset::Set achieves ~ 50% reduction in peak heap memory allocations with no additional performance overhead.

Benchmarks

Operation fastset::Set hashbrown::HashSet std::collections::HashSet
insert 1.1632 ns 4.7105 ns 14.136 ns
remove 1.1647 ns 3.0459 ns 10.625 ns
contains 932.81 ps 985.38 ps 13.827 ns
random 651.26 ps N/A N/A
  • CPU: AMD Ryzen™ 5 5600G with Radeon™ Graphics x 12
  • RAM: 58.8 GiB
  • OS: Guix System, 64-bit

Usage

use fastset::{set, Set};
use nanorand::WyRand;

   let mut set = set![5, 10, 15, 20, 25, 30]; // Initialize set with elements
   assert!(set.contains(&5)); // Check for element presence

   set.insert(35); // Insert a new element
   assert!(set.contains(&35));

   set.remove(&5); // Remove an element
   assert!(!set.contains(&5));

   if let Some(taken) = set.take(&10) { // Remove and return an element
       assert_eq!(taken, 10);
   }

   let mut rng = WyRand::new();
   if let Some(element) = set.random(&mut rng) { // Get a random element
       set.remove(&element); // Remove the randomly selected element
       assert!(!set.contains(&element));
   }

   println!("Set: {:?}, Length: {}", set, set.len()); // Display the set and its length

Delphic Sets

fastset::Set, as implemented here, meets the conditions for being a Delphic set [1], [2]:

Let Ω be a discrete universe. A set (S ⊆ Ω) is considered a member of a Delphic family if it supports the following operations within O(log |Ω|) time:

  • Membership: Verify if any element (x ∈ Ω) exists within (S).
  • Cardinality: Determine the size of (S), i.e., (|S|).
  • Sampling: Draw a uniform random sample from (S).

A unit test in src/set.rs verifies the uniform sampling property with a basic Chi-squared test.

use fastset::Set;
use nanorand::WyRand;
use statrs::distribution::{ChiSquared, ContinuousCDF};

fn sampling_is_uniformly_at_random() {
    const SAMPLES: usize = 1_000_000;
    const EDGE_OF_THE_UNIVERSE: usize = 10000;

    let elements = (1..=EDGE_OF_THE_UNIVERSE).collect::<Vec<_>>();
    let set = Set::from(elements.clone());
    let mut rng = WyRand::new_seed(42u64);
    let mut counts = vec![0f64; elements.len()];

(0..SAMPLES).for_each(|_| {
   if let Some(value) = set.random(&mut rng) {
       counts[value - 1] += 1.0;
   }
});

    let e = SAMPLES as f64 / elements.len() as f64;
    let statistic: f64 = counts.iter().map(|&o| { (o - e) * (o - e) / e }).sum();

    let dof = elements.len() - 1;
    let chi = ChiSquared::new(dof as f64).unwrap();
    let acceptable = chi.inverse_cdf(0.99);

    // Null hypothesis: Elements are sampled uniformly at random
    assert!(
        statistic < acceptable,
        "Chi-square statistic {} is greater than what's acceptable ({})",
        statistic,
        acceptable,
    );
}

References

[1]: Chakraborty, Sourav, N. V. Vinodchandran, and Kuldeep S. Meel. "Distinct Elements in Streams: An Algorithm for the (Text) Book." arXiv preprint arXiv:2301.10191 (2023).

[2]: Meel, Kuldeep S., Sourav Chakraborty, and N. V. Vinodchandran. "Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size." Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 2022.

Dependencies

~0.4–1MB
~21K SLoC