#string-interning #interning #memoization #string-cache #intern #cache

interned

Provides a generic Interned<T> which can intern practically any type including &str, slices, and primitives, plus memoization via Memoized<I, T>

7 releases

0.1.6 Aug 28, 2023
0.1.5 Aug 20, 2023
0.1.1 Jul 22, 2023

#93 in Caching

Download history 20/week @ 2024-07-27 5/week @ 2024-09-21

55 downloads per month

MIT license

67KB
1K SLoC

Crates.io docs.rs Build Status MIT License

Interned provides highly optimized, thread-local, generic interning via Interned<T> and a memoization layer built on top of this interning layer, provided by Memoized<I, T>, which can cache the result of an arbitrary input I: Hash and intern this result in the underlying interning layer.

Blanket implementations supporting T are provided for all primitives, slices of Sized T (including &[u8]), as well as str slices (&str). Support for additional arbitrary types can be added by implementing DataType, Staticize, and Hash. str slices have a custom implementation since they are the only built-in unsized type with slice support.

All values are heap-allocated 'statics and benefit from TypeId-specific locality of reference in the heap. Any two Interned<T> instances that have the same value of T will be guaranteed to point to the same memory address in the heap. Among other things, this allows for O(1) (in the size of the data) equality comparisons since the heap addresses are compared instead of having to compare the underlying data bit-by-bit. This makes interned types especially suited for parsing and similar low-entropy data tasks.

A caveat of the 'static lifetime and immutability of the underlying heap data is that unique values of Interned<T> and Memoized<I, T> leak in the sense that they can never be de-allocated. This allows us to implement Copy on all interned types, because we can rely on the heap pointer to continue existing for the life of the program once it has been created for a particular value. For this reason, you should not use this crate for long-running programs that will encounter an unbounded number of unique values, such as those created by an unending stream of user input.

Because the internal size of an Interned<T> on the stack is the size of a usize (pointer) plus a u64 (cached hash code), it would be silly to use Interned<T> with integer types directly, however it makes sense to do so for the purposes of memoizing an expensive computation via Memoized<I, T>.

An interned string type, InStr, is also provided as a convenient wrapper around Interned<&'static str>. It has a number of extra impls and should be your go-to type if you want to work with interned strings.

Interned Example

#[test]
fn test_interned_showcase() {
    let a: Interned<i32> = 1289.into();
    let b = Interned::from(1289);
    let c: Interned<i32> = 47.into();
    assert_eq!(a, b);
    assert_ne!(a, c);
    assert_eq!(a.as_ptr(), b.as_ptr());
    assert_ne!(b.as_ptr(), c.as_ptr());
    let d: Interned<&str> = "asdf".into();
    assert_ne!(d, "fdsa".into());
    assert_eq!(Interned::from("asdf"), d);
    let e = Interned::from([1, 2, 3, 4, 5].as_slice());
    let f = InStr::from("abc");
    let g: InStr = "abc".into();
    assert_eq!(f, g);
    assert_eq!(f.as_ptr(), g.as_ptr());
    assert_eq!(e, [1, 2, 3, 4, 5].as_slice().into());
    assert_ne!(e, [4, 1, 7].as_slice().into());
    assert_eq!(format!("{b:?}"), "Interned<i32> { value: 1289 }");
    assert_eq!(format!("{d:?}"), "Interned<&str> { str: \"asdf\" }");
    assert_eq!(e[3], 4);
    assert_eq!(e[0], 1);
    assert_eq!(
        format!("{e:?}"),
        "Interned<&[i32]> { slice: [1, 2, 3, 4, 5] }"
    );
}

Memoized Examples

#[test]
fn test_memoized_basic() {
    let initial_interned = num_interned::<usize>();
    let initial_memoized = num_memoized::<usize>();
    let a = Memoized::from("scope a", "some_input", |input| input.len().into());
    let b = Memoized::from("scope a", "other", |input| input.len().into());
    assert_ne!(a, b);
    let c = Memoized::from("scope a", "some_input", |input| input.len().into());
    assert_eq!(a, c);
    assert_ne!(b, c);
    assert_eq!(a.as_value(), &10);
    assert_ne!(*a.as_value(), 11);
    assert_eq!(*b.interned().interned_value(), 5);
    assert_eq!(*c.as_value(), 10);
    assert_eq!(num_interned::<usize>(), initial_interned + 2);
    assert_eq!(num_memoized::<usize>(), initial_memoized + 2);
}

The following demonstrates how "scopes" work with Memoized:

#[test]
fn test_memoized_showcase() {
    fn expensive_fn(a: usize, b: usize, c: usize) -> String {
        format!("{}", a * a + b * b + c * c)
    }
    let a = Memoized::from("my_scope", (1, 2, 3), |tup: (usize, usize, usize)| {
        expensive_fn(tup.0, tup.1, tup.2).as_str().into()
    });
    assert_eq!(a.as_str(), "14");
}

Dependencies

~0–7MB
~48K SLoC