5 releases
0.0.4 | May 6, 2023 |
---|---|
0.0.3 | Oct 14, 2022 |
0.0.2 | Sep 10, 2021 |
0.0.1 | Sep 9, 2021 |
0.0.0 | Sep 5, 2021 |
#54 in No standard library
225KB
4K
SLoC
🌎🔒 interlock
Experimental, requires unstable compiler features
Readers-writer locks designed for locking intervals. #![no_std]
compatible.
use std::pin::Pin;
use interlock::{hl::slice::SyncRbTreeVecIntervalRwLock, state};
let vec = Box::pin(SyncRbTreeVecIntervalRwLock::new(vec![0u8; 64]));
let vec = vec.as_ref();
// Borrow `vec[0..32]`
state!(let mut state);
let guard1 = vec.read(0..32, (), state);
// Borrow `vec[16..32]`
state!(let mut state);
let guard2 = vec.read(16..32, (), state);
// Mutably borrow `vec[16..48]` unsuccessfully
state!(let mut state);
vec.try_write(16..48, Pin::as_mut(&mut state)).unwrap_err();
// Unborrow `vec[0..32]` completely
drop(guard1);
drop(guard2);
// Mutably borrow `vec[16..48]`
vec.try_write(16..48, Pin::as_mut(&mut state)).unwrap();
use parking_lot::RawMutex;
use interlock::{hl::slice::AsyncRbTreeVecIntervalRwLock, state};
#[tokio::main]
async fn main() {
let vec = Box::pin(AsyncRbTreeVecIntervalRwLock::<RawMutex, _>::new(vec![0u8; 64]));
let vec = vec.as_ref();
state!(let mut state);
let _guard = vec.async_read(0..32, (), state).await;
state!(let mut state);
let _guard = vec.async_read(16..48, (), state).await;
}
Design
Red-black tree based implementation
Locking Interface | Storage | Type Alias |
---|---|---|
Fallible + Panicking, !Sync |
&mut [T] |
crate::hl::slice::LocalRbTreeSliceRefIntervalRwLock |
↑ | Vec<T> |
crate::hl::slice::LocalRbTreeVecIntervalRwLock |
Fallible + Blocking | &mut [T] |
crate::hl::slice::SyncRbTreeSliceRefIntervalRwLock |
↑ | Vec<T> |
crate::hl::slice::SyncRbTreeVecIntervalRwLock |
Fallible + Future -oriented |
&mut [T] |
crate::hl::slice::AsyncRbTreeSliceRefIntervalRwLock |
↑ | Vec<T> |
crate::hl::slice::AsyncRbTreeVecIntervalRwLock |
They are modeled as an array of virtual readers-writer locks. When locking, the virtual locks in the specified range are locked in ascending order. When unlocking, they are unlocked in descending order. (The locking order is important to prevent deadlocks.) The wait list of each virtual lock is ordered by (priority, sequence)
, where sequence
is a monotonically increasing number (thus enforcing FIFO ordering). When a virtual lock is unlocked, all entries in the wait list are examined and resumed instantly (if possible) in order.
On a lock conflict, the fallible interface fails and leaves all virtual locks unchanged. Other interfaces maintain the incomplete borrow until it's cancelled or completed by the removal of a conflicting borrow.
The Future
-oriented interface supports cancellation.
The worst-case time complexity is shown below:
Operation | Time Complexity |
---|---|
Locking | O(log(existing_borrows)) |
Unlocking | O((1 + potentially_resumed_borrows) * log(existing_borrows)) |
The space complexity is O(existing_borrows)
.
Cargo features
std
enables the items that depend onstd
oralloc
.alloc
enables the items that depend onalloc
. This currently requires a target withusize
atomics support becausestable_deref_trait/alloc
unconditionally implementsStableDeref
onArc
, which is gated bycfg(target_has_atomic = "ptr")
.async
enables theFuture
-oriented API. This currently requires a target with load/store atomics support. Whenlock_api
issue #277 is resolved, this requirement will be lifted, and this Cargo feature will be deprecated.
Future directions
- More thorough testing is probably required with respect to untrusted trait
impl
s and various kinds of safety. - The
Mutex
andReentrantMutex
counterparts should be provided in addition to that ofRwLock
. - Variants with simpler algorithms, such as linked lists, should be provided.
- The dependency on unstable compiler features should preferably be removed at some point.
- The hack for sound intrusive data structures (
EarlyDropGuard
) should be removed when Rust's memory model is adjusted to properly accommodate such structures.
Alternatives
interlock
is surprisingly niche. While it may scale well for large-scale inputs, its complexity and overhead are likely to outweigh the benefit in many practical situations. Consider the following alternatives before choosing interlock
:
-
Dividing slices: The language and the standard library provide many ways to divide
&mut [T]
. For example:- Pattern matching (e.g.,
if let [first, rest @ ..] ...
) can separate the first or last few elements from the rest of a given slice. slice::iter_mut
creates a set of&mut T
, which can be used separately.slice::chunks_mut
breaks&mut [T]
into chunks.slice::split_at_mut
divides&mut [T]
into two. An arbitrary number of pieces can be created by repeating this process.
- Pattern matching (e.g.,
-
No ordering requirements, no reference forming: If borrows can be unordered in respect to other borrows, getting access to
&[T]
or&mut [T]
is not necessary, and...- Single-threaded: ... the slice
[T]
is only referenced by a single thread, then consider wrapping individual elements withCell
<T>
. You can turn&mut [T]
into&[Cell<T>]
byCell::as_slice_of_cells
. - Integers: ... the elements are of a primitive integer type, then consider changing the element type to
std::sync::atomic
::Atomic*
and accessing them withOrdering::Relaxed
.
- Single-threaded: ... the slice
-
Short-lived borrows: If each borrow is expected to be short-lived, consider wrapping the whole slice with a
Mutex
orRwLock
.interlock
uses aMutex
to protect internal structures, so it incurs the overhead ofMutex
plus bookkeeping. -
Parallel data processing: Consider using Rayon for parallel computation.
-
Per-element or per-segment borrows: If you only need to borrow individual elements or segments (i.e., if it's acceptable to get
&mut T
or&mut [T; SEGMENT_LEN]
instead of a more general, contiguous&mut [T]
), consider wrapping individual elements or segments withparking_lot
::{Mutex, RwLock}
, which only take one byte and one word of space, respectively. In some cases,spin
might be a more preferable choice.- Suppose you replace
N
borrows fromMutex
with one borrow fromSyncRbTreeSliceIntervalRwLock
. While the latter's locking time doesn't depend onN
, due its inherent complexity, ifN
is too small, you might actually lose performance. According to some single-threaded micro-benchmark conducted on an AMD Zen processor, the threshold seems to be somewhere aroundN = 6
. Expect a higher threshold in a real application because of cache effects and lock contention.
- Suppose you replace
-
Single processor, CPU-bound: If only one processor accesses the slice simultaneously, and it's expected that the processor is fully occupied while the slice is borrowed, consider wrapping the whole slice with a
Mutex
orRwLock
. In such cases, being able to borrow disjoint subslices simultaneously doesn't improve the overall throughput.
Dependencies
~1–1.7MB
~33K SLoC