9 releases (breaking)

0.8.0 Sep 5, 2024
0.7.0 Sep 5, 2024
0.6.0 Sep 5, 2024
0.5.0 Aug 14, 2024
0.1.1 Feb 25, 2024

#341 in Concurrency

MIT license

28KB
599 lines

folklore

A lock-free concurrent hash map, based on the 'folklore' implementation described in "Concurrent Hash Tables: Fast and General(?)!" by Maier et al.

What?

This has some major limitations compared to a more general hash-map implementation. Namely;

  • It cannot be grown past its initial capacity.
  • The capacity is limited to i16::MAX.
  • It can only store values which are exactly 2 bytes.
  • Removals are not (currently) supported (because of the immense slowdown caused by tombstones filling up the map).

The only benefits are:

  • Blazingly fast 🔥 for concurrent access / modification.
  • Can be shared safely across threads without requiring Mutex, RwLock etc.

This is kind of just a fun project exploring the implementation of something I read about in an academic paper. I wouldn't really recommend using it.

How?

Map entries are a 16-bit key offset, a 16-bit value, and a 32-bit key hash. This means that any operation on a map entry can be completed with a single 64-bit (1 word) CAS instruction.

The actual map entries store a "key offset" rather than a key, because the keys are allocated in a separate store. The key store is a "ConcurrentArray" which is lock-free and safe for concurrent access, but entries are immutable, and can only be removed if they were the most recently added.

Consistency

Loads and Stores generally use Ordering::Acquire and Ordering::Release respectively. Initial lookup for an entry uses Ordering::Relaxed for performance reasons, so sometimes a newly inserted key might be missed by another thread. However, that thread will never overwrite the key, because a stronger ordering is used for the actual insertion.

Performance

Some basic benchmarks are included in this repo which compare against std::collections::HashMap and leapfrog::LeapMap. There are a set of benchmarks for single-thread, and a set for multi-thread. Here are the numbers I got on an M1 Pro MacBook:

Single-threaded

Map Time Throughput
std HashMap 7.9036ms 49.750 Melem/s
leapfrog LeapMap 8.8983ms 44.189 Melem/s
folklore HashMap 4.9738ms 79.055 Melem/s

Multi-threaded (8 threads)

Map Time Throughput
std HashMap (RWLock) 58.689ms 53.599 Melem/s
leapfrog LeapMap 18.841ms 166.96 Melem/s
folklore HashMap 16.571ms 189.83 Melem/s

The numbers of each benchmark are pretty useless on their own, but comparing them we can see that folklore manages to just about beat out leapfrog. Again these benchmarks are very basic, only testing insertion and updating.

Inspired by the ConcurrentMap implementation in couchbase/fleece.

robclu/leapfrog was instrumental in my understanding how this kinda thing should be written in Rust. The leapfrog map also doesn't suffer from many of the limitations of this map, and is a much better choice in most cases.

Dependencies

~310KB