2 releases
Uses new Rust 2024
new 0.1.1 | Mar 27, 2025 |
---|---|
0.1.0 | Mar 27, 2025 |
#13 in #string-interning
29 downloads per month
54KB
778 lines
Stringleton
Extremely efficient string interning solution for Rust crates.
String interning: The technique of representing all strings which are equal by a pointer or ID that is unique to the contents of that strings, such that O(n) string equality check becomes a O(1) pointer equality check.
Interned strings in Stringleton are called "symbols", in the tradition of Ruby.
Distinguishing characteristics
- Ultra fast: Getting the string representation of a
Symbol
is a lock-free memory load. No reference counting or atomics involved. - Symbol literals (
sym!(...)
) are "free" at the call-site. Multiple invocations with the same string value are eagerly reconciled on program startup using linker tricks. - Symbols are tiny. Just a single pointer - 8 bytes on 64-bit platforms.
- Symbols are trivially copyable - no reference counting.
- No size limit - symbol strings can be arbitrarily long (i.e., this is not a "small string optimization" implementation).
- Debugger friendly: If your debugger is able to display a plain Rust
&str
, it is capable of displayingSymbol
. - Dynamic library support: Symbols can be passed across dynamic linking
boundaries (terms and conditions apply - see the documentation of
stringleton-dylib
). no_std
support:std
synchronization primitives used in the symbol registry can be replaced withonce_cell
andspin
. See below for caveats.serde
support - symbols are serialized/deserialized as strings.- Fast bulk-insertion of symbols at runtime.
Good use cases
- You have lots of little strings that you need to frequently copy and compare.
- Your strings come from trusted sources.
- You want good debugger support for your symbols.
Bad use cases
- You have an unbounded number of distinct strings, or strings coming from untrusted sources. Since symbols are never garbage-collected, this is a source of memory leaks, which is a denial-of-service hazard.
- You need a bit-stable representation of symbols that does not change between runs.
- Consider if
smol_str
orcowstr
is a better fit for such use cases.
Usage
Add stringleton
as a dependency of your project, and then you can do:
use stringleton::{sym, Symbol};
// Enable the `sym!()` macro in the current crate. This should go at the crate root.
stringleton::enable!();
let foo = sym!(foo);
let foo2 = sym!(foo);
let bar = sym!(bar);
let message = sym!("Hello, World!");
let message2 = sym!("Hello, World!");
assert_eq!(foo, foo2);
assert_eq!(bar.as_str(), "bar");
assert_eq!(message, message2);
assert_eq!(message.as_str().as_ptr(), message2.as_str().as_ptr());
Crate features
- std (enabled by default): Use synchronization primitives from the
standard library. Implies
alloc
. When disabled,critical-section
andspin
must both be enabled (see below for caveats). - alloc (enabled by default): Support creating symbols from
String
. - serde: Implements
serde::Serialize
andserde::Deserialize
for symbols, which will be serialized/deserialized as plain strings. - debug-assertions: Enables expensive debugging checks at runtime - mostly useful to diagnose problems in complicated linker scenarios.
- critical-section: When
std
is not enabled, this enablesonce_cell
as a dependency with thecritical-section
feature enabled. Only relevant inno_std
environments. Seecritical-section
for more details. - spin: When
std
is not enabled, this enablesspin
as a dependency, which is used to obtain global read/write locks on the symbol registry. Only relevant inno_std
environments (and is a pessimization in other environments).
Efficiency
Stringleton tries to be as efficient as possible, but it may make different
tradeoffs than other string interning libraries. In particular, Stringleton is
optimized towards making the use of the sym!(...)
macro practically free.
Consider this function:
fn get_symbol() -> Symbol {
sym!("Hello, World!")
}
This compiles into a single load instruction. Using cargo disasm
on x86-64
(Linux):
get_symbol:
8bf0 mov rax, qword ptr [rip + 0x52471]
8bf7 ret
This is "as fast as it gets", but the price is that all symbols in the program are deduplicated when the program starts. Any theoretically faster solution would need fairly deep cooperation from the compiler aimed at this specific use case.
Also, symbol literals are always a memory load. The compiler cannot perform
optimizations based on the contents of symbols, because it doesn't know how they
will be reconciled until link time. For example, while sym!(a) != sym!(a)
is
always false, the compiler cannot eliminate code paths relying on that.
Dynamic libraries
Stringleton relies on magical linker tricks (supported by linkme
and ctor
)
to minimize the cost of the sym!(...)
macro at runtime. These tricks are
broadly compatible with dynamic libraries, but there are a few caveats:
- When a Rust
dylib
crate appears in the dependency graph, and it hasstringleton
as a dependency, things should "just work", due to Rust's linkage rules. - When a Rust
cdylib
crate appears in the dependency graph, Cargo seems to be a little less clever, and thecdylib
dependency may need to use thestringleton-dylib
crate instead. Due to Rust's linkage rules, this will cause the "host" crate to also link dynamically with Stringleton, and everything will continue to work. - When a library is loaded dynamically at runtime, and it does not appear in
the dependency graph, the "host" crate must be prevented from linking
statically to
stringleton
, because it would either cause duplicate symbol definitions, or worse, the host and client binaries would disagree about whichRegistry
to use. To avoid this, the host binary can usestringleton-dylib
explicitly instead ofstringleton
, which forces dynamic linkage of the symbol registry. - Dynamically unloading libraries is extremely risky (
dlclose()
and similar). Unloading a library that has any calls to thesym!(..)
orstatic_sym!(..)
macros is instant UB. Such a library can in principle useSymbol::new()
, but probably notSymbol::new_static()
.
To summarize:
- When no dynamic libraries are present in the project, it is always best to
use
stringleton
directly. - When only normal Rust dynamic libraries (
crate-type = ["dylib"]
) are present, it is also fine to usestringleton
directly - Cargo and rustc will figure out how to link things correctly. cdylib
dependencies should usestringleton-dylib
. The host can usestringleton
.- When loading dynamic libraries at runtime, both sides should use
stringleton-dylib
instead ofstringleton
. - Do not unload dynamic libraries at runtime unless you are really, really sure what you are doing.
no_std
caveats
Stringleton works in no_std
environments, but it does fundamentally require
two things:
- Allocator support, in order to maintain the global symbol registry. This is a
hashbrown
hash map. - Some synchronization primitives to control access to the global symbol registry when new symbols are created.
The latter can be supported by the spin
and critical-section
features:
spin
replacesstd::sync::RwLock
, and is almost always a worse choice whenstd
is available.critical-section
replacesstd::sync::OnceLock
withonce_cell::sync::OnceCell
, and enables thecritical-secion
feature ofonce_cell
. Usingcritical-section
requires additional work, because you must manually link in a crate that provides the relevant synchronization primitive for the target platform.
Do not use these features unless you are familiar with the tradeoffs.
Name
The name is a portmanteau of "string" and "singleton".
lib.rs
:
Dynamic linking support for Stringleton.
This crate always produces a dynamic library, and it should be used by any
crate that ends up being a cdylib
. When this appears somewhere in the
dependency graph, it causes the Rust compiler to produce a dynamic version
of stringleton-registry
, which means that both uses of stringleton
and
stringleton-dylib
use the same symbol registry, so Symbol
s can be safely
passed across the dynamic linking boundary.
The host crate can safely use stringleton
as a dependency, except when
dynamic libraries using stringleton-dylib
are loaded at runtime (i.e.,
Rust cannot know that stringleton-registry
should be dynamically linked).
In that case, the host crate should specify this crate as its dependency
instead of stringleton
.
Dependencies
~1–1.8MB
~32K SLoC