10 releases (5 breaking)
0.6.1 | Oct 2, 2023 |
---|---|
0.6.0 | Apr 2, 2023 |
0.5.1 | Mar 30, 2023 |
0.5.0 | Mar 14, 2021 |
0.1.0 | Jun 9, 2020 |
#643 in Rust patterns
137 downloads per month
Used in 8 crates
(3 directly)
54KB
1K
SLoC
Interning library based on atomic reference counting
For the docs see docs.rs.
When developing please make use of cargo bench
and cargo +nightly miri
— the multithreading tests are sized such that they bring issues to the surface when run with miri (which is rather slow).
Once you notice problems, debugging is easier with
export MIRIFLAGS=-Zmiri-disable-isolation
cargo +nightly miri test --features println -- --nocapture multithreading_hash
Another very useful tool is loom, run it with
LOOM_LOG=1 LOOM_LOCATION=1 LOOM_CHECKPOINT_INTERVAL=1 RUST_BACKTRACE=1 RUSTFLAGS="--cfg loom" cargo test --lib
Note: loom
currently does not work because it doesn’t support Weak references.
License
At your option: Apache-2.0 or MIT
lib.rs
:
Interning library based on atomic reference counting
This library differs from arc-interner
(which served as initial inspiration) in that
- interners must be created and can be dropped (for convenience functions see below)
- it does not dispatch based on
TypeId
(each interner is for exactly one type) - it offers both
Hash
-based andOrd
-based storage - it handles unsized types without overhead, so you should inline
str
instead ofString
Unfortunately, this combination of features makes it inevitable to use unsafe Rust.
The construction of unsized values has been adapted from the standard library’s
Arc
type; reference counting
employs a parking_lot
mutex since it must also ensure
consistent access to and usage of the cleanup function that will remove the value from its interner.
The test suite passes also under miri to
check against some classes of undefined behavior in the unsafe code (including memory leaks).
API flavors
The same API is provided in two flavors:
HashInterner
uses hash-based storage, namely the standardHashSet
OrdInterner
uses ord-based storage, namely the standardBTreeSet
Interning small values takes of the order of 100–200ns on a typical server CPU. The ord-based storage has an advantage when interning large values (like slices greater than 1kB). The hash-based storage has an advantage when keeping lots of values (many thousands and up) interned at the same time.
Nothing beats your own benchmarking, though.
Convenience access to interners
When employing interning for example within a dedicated deserialiser thread, it is best to create and locally use an interner, avoiding further synchronisation overhead. You can also store interners in thread-local variables if you only care about deduplication per thread.
That said, this crate also provides convenience functions based on a global type-indexed pool:
use intern_arc::{global::hash_interner};
let i1 = hash_interner().intern_ref("hello"); // -> InternedHash<str>
let i2 = hash_interner().intern_sized(vec![1, 2, 3]); // -> InternedHash<Vec<i32>>
You can also use the type-indexed pool yourself to control its lifetime:
use intern_arc::{global::OrdInternerPool, InternedOrd};
let mut pool = OrdInternerPool::new();
let i: InternedOrd<[u8]> = pool.get_or_create().intern_box(vec![1, 2, 3].into());
Dependencies
~0.4–24MB
~343K SLoC