3 unstable releases
0.2.0 | Oct 22, 2020 |
---|---|
0.1.1 | Oct 19, 2020 |
0.1.0 | Oct 19, 2020 |
#1986 in Algorithms
26KB
575 lines
refset - non-owning HashSet
A hash-set analogue that does not own its data.
It can be used to "mark" items without the need to transfer ownership to the map
Example use case
/// Process arguments while ignoring duplicates
fn process_args(args: impl IntoIterator<Item=String>) {
let mut same= HashRefSet::new();
for argument in args.into_iter()
{
if !same.insert(argument.as_str()) {
// Already processed this input, ignore
continue;
}
//do work...
}
}
Serialisation support with serde
crate
HashRefSet
and HashType
both implement Serialize
and Deserialize
from the serde
crate if the serde
feature is enabled. By default it is not.
Hashing
We use the SHA512 hashing algorithm for the implementation at present. I may implement the ability to choose different types, but as of now I think it is sufficient.
Drawbacks
Since the item is not inserted itself, we cannot use Eq
to double check there was not a hash collision.
While the hashing algorithm used (Sha512) is extremely unlikely to produce collisions, especially for small data types, keep in mind that it is not infallible.
Speed
HashRefSet
is significantly slower than HashSet
, so HashSet
should be preferred in most cases.
Even when Clone
is required to insert into HashSet
, it can be ~10x faster for trivial data structures.
HashRefSet
should be used if Clone
is either not an option, or Clone
is a significantly heavy operation on the type you're inserting.
Benchmark | Tests | Result |
---|---|---|
owning_strings | Inserts String into HashSet by cloning |
~4,538 ns/iter |
non_owning_strings | Inserts str into HashRefSet by reference |
~48,271 ns/iter |
owning_ints | Inserts u32 into HashSet by copy |
~937 ns/iter |
non_owning_ints | Inserts &u32 into HashRefSet by reference |
~31,089 ns/iter |
When to use over HashSet
- The type you're inserting needs to be both in the set and moved elsewhere. (see exmaple)
- Simply using
Clone
to insert a copy of the item into aHashSet
is not possible (non-Clone
type) or is a significantly heavy operation. (see benchmarks) - The fallibility of potential (albeing extremely unlikely) collisions of the SHA512 algorithm is not a concern
- You need to insert an unsized type into a
HashSet
Smallmap implementation
With the smallmap
feature enabled, the small
module also provides the same API as HashRefSet
via SmallRefMap
.
It is backed by smallmap::Map
instead of HashSet
, which could potentially have some performance or memory usage impacts, or not.
The hashing algorithm and usage is otherwise identical for now, but this may change.
Benchmarks of SmallRefMap
Comparing with cloning or copying into smallmap::Map
.
Largely there are the same performance penalties as the above table, with very minor differences.
Benchmark | Tests | Result |
---|---|---|
owning_strings | Inserts String into SmallMap by cloning |
~3,096 ns/iter |
non_owning_strings | Inserts str into SmallRefMap by reference |
~47,302 ns/iter |
owning_ints | Inserts u32 into SmallMap by copy |
~316 ns/iter |
non_owning_ints | Inserts &u32 into SmallRefMap by reference |
~30,046 ns/iter |
Each page of the SmallRefMap
will consume at least 16kb of memory however.
This may not be very desireable, but is still an available feature.
License
MIT
Dependencies
~400–660KB
~15K SLoC