8 unstable releases (3 breaking)
0.4.1 | Apr 5, 2024 |
---|---|
0.4.0 | Apr 1, 2024 |
0.3.0 | Mar 23, 2024 |
0.2.1 | Mar 21, 2024 |
0.1.2 | Mar 13, 2024 |
#721 in Data structures
555 downloads per month
Used in signvec
125KB
2K
SLoC
fastset
Fast set implementation for dense, bounded integer collections, offering quick updates and random access.
Features
- Tailored for unsigned integer (
usize
) elements, ideal for index-based applications - Fast insertion, removal, and membership check
random
method for uniform random sampling- Paging mechanism to somewhat mitigate the large memory footprint[^1]
Note that while paging improves the existing memory footprint,
fastset::Set
is still not a good solution for memory constrained applications
or for applications with storage need for sparse elements spread over an extended range.
For integers twice as sparse as the page size, the fastset::Set
with paging
has peak heap allocation ~ 8x that of std::collections::HashSet
.
[^1]: A paging mechanism is introduced in 0.4.0
that reduces the memory-footprint of fastset::Set
.
With the paging feature, fastset::Set
achieves ~ 50% reduction in peak heap memory allocations
with no additional performance overhead.
Benchmarks
Operation | fastset::Set |
hashbrown::HashSet |
std::collections::HashSet |
---|---|---|---|
insert | 1.1632 ns | 4.7105 ns | 14.136 ns |
remove | 1.1647 ns | 3.0459 ns | 10.625 ns |
contains | 932.81 ps | 985.38 ps | 13.827 ns |
random | 651.26 ps | N/A | N/A |
- CPU: AMD Ryzen™ 5 5600G with Radeon™ Graphics x 12
- RAM: 58.8 GiB
- OS: Guix System, 64-bit
Usage
use fastset::{set, Set};
use nanorand::WyRand;
let mut set = set![5, 10, 15, 20, 25, 30]; // Initialize set with elements
assert!(set.contains(&5)); // Check for element presence
set.insert(35); // Insert a new element
assert!(set.contains(&35));
set.remove(&5); // Remove an element
assert!(!set.contains(&5));
if let Some(taken) = set.take(&10) { // Remove and return an element
assert_eq!(taken, 10);
}
let mut rng = WyRand::new();
if let Some(element) = set.random(&mut rng) { // Get a random element
set.remove(&element); // Remove the randomly selected element
assert!(!set.contains(&element));
}
println!("Set: {:?}, Length: {}", set, set.len()); // Display the set and its length
Delphic Sets
fastset::Set
, as implemented here, meets the conditions for being a Delphic set [1], [2]:
Let Ω be a discrete universe. A set (S ⊆ Ω) is considered a member of a Delphic family if it supports the following operations within O(log |Ω|) time:
- Membership: Verify if any element (x ∈ Ω) exists within (S).
- Cardinality: Determine the size of (S), i.e., (|S|).
- Sampling: Draw a uniform random sample from (S).
A unit test in src/set.rs
verifies the uniform sampling property with a basic Chi-squared test.
use fastset::Set;
use nanorand::WyRand;
use statrs::distribution::{ChiSquared, ContinuousCDF};
fn sampling_is_uniformly_at_random() {
const SAMPLES: usize = 1_000_000;
const EDGE_OF_THE_UNIVERSE: usize = 10000;
let elements = (1..=EDGE_OF_THE_UNIVERSE).collect::<Vec<_>>();
let set = Set::from(elements.clone());
let mut rng = WyRand::new_seed(42u64);
let mut counts = vec![0f64; elements.len()];
(0..SAMPLES).for_each(|_| {
if let Some(value) = set.random(&mut rng) {
counts[value - 1] += 1.0;
}
});
let e = SAMPLES as f64 / elements.len() as f64;
let statistic: f64 = counts.iter().map(|&o| { (o - e) * (o - e) / e }).sum();
let dof = elements.len() - 1;
let chi = ChiSquared::new(dof as f64).unwrap();
let acceptable = chi.inverse_cdf(0.99);
// Null hypothesis: Elements are sampled uniformly at random
assert!(
statistic < acceptable,
"Chi-square statistic {} is greater than what's acceptable ({})",
statistic,
acceptable,
);
}
References
[1]: Chakraborty, Sourav, N. V. Vinodchandran, and Kuldeep S. Meel. "Distinct Elements in Streams: An Algorithm for the (Text) Book." arXiv preprint arXiv:2301.10191 (2023).
[2]: Meel, Kuldeep S., Sourav Chakraborty, and N. V. Vinodchandran. "Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size." Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 2022.
Dependencies
~0.4–1MB
~21K SLoC