3 unstable releases

0.2.0 Oct 22, 2020
0.1.1 Oct 19, 2020
0.1.0 Oct 19, 2020

#1986 in Algorithms

MIT license

26KB
575 lines

refset - non-owning HashSet

A hash-set analogue that does not own its data.

It can be used to "mark" items without the need to transfer ownership to the map

Example use case

/// Process arguments while ignoring duplicates
fn process_args(args: impl IntoIterator<Item=String>) {
  let mut same=  HashRefSet::new();
  for argument in args.into_iter()
  {
    if !same.insert(argument.as_str()) {
      // Already processed this input, ignore
      continue;
    }
    //do work...
  }
}

Serialisation support with serde crate

HashRefSet and HashType both implement Serialize and Deserialize from the serde crate if the serde feature is enabled. By default it is not.

Hashing

We use the SHA512 hashing algorithm for the implementation at present. I may implement the ability to choose different types, but as of now I think it is sufficient.

Drawbacks

Since the item is not inserted itself, we cannot use Eq to double check there was not a hash collision. While the hashing algorithm used (Sha512) is extremely unlikely to produce collisions, especially for small data types, keep in mind that it is not infallible.

Speed

HashRefSet is significantly slower than HashSet, so HashSet should be preferred in most cases. Even when Clone is required to insert into HashSet, it can be ~10x faster for trivial data structures. HashRefSet should be used if Clone is either not an option, or Clone is a significantly heavy operation on the type you're inserting.

Benchmark Tests Result
owning_strings Inserts String into HashSet by cloning ~4,538 ns/iter
non_owning_strings Inserts str into HashRefSet by reference ~48,271 ns/iter
owning_ints Inserts u32 into HashSet by copy ~937 ns/iter
non_owning_ints Inserts &u32 into HashRefSet by reference ~31,089 ns/iter

When to use over HashSet

  • The type you're inserting needs to be both in the set and moved elsewhere. (see exmaple)
  • Simply using Clone to insert a copy of the item into a HashSet is not possible (non-Clone type) or is a significantly heavy operation. (see benchmarks)
  • The fallibility of potential (albeing extremely unlikely) collisions of the SHA512 algorithm is not a concern
  • You need to insert an unsized type into a HashSet

Smallmap implementation

With the smallmap feature enabled, the small module also provides the same API as HashRefSet via SmallRefMap. It is backed by smallmap::Map instead of HashSet, which could potentially have some performance or memory usage impacts, or not.

The hashing algorithm and usage is otherwise identical for now, but this may change.

Benchmarks of SmallRefMap

Comparing with cloning or copying into smallmap::Map.

Largely there are the same performance penalties as the above table, with very minor differences.

Benchmark Tests Result
owning_strings Inserts String into SmallMap by cloning ~3,096 ns/iter
non_owning_strings Inserts str into SmallRefMap by reference ~47,302 ns/iter
owning_ints Inserts u32 into SmallMap by copy ~316 ns/iter
non_owning_ints Inserts &u32 into SmallRefMap by reference ~30,046 ns/iter

Each page of the SmallRefMap will consume at least 16kb of memory however. This may not be very desireable, but is still an available feature.

License

MIT

Dependencies

~400–660KB
~15K SLoC