2 releases
0.1.1 | Sep 8, 2023 |
---|---|
0.1.0 | Sep 7, 2023 |
#2342 in Data structures
51KB
761 lines
Justly
This is a library for “justified containers”, inspired by the justified-containers library in Haskell.
Briefly, this library might help you wherever you’re using indices instead of references.
Check out the rustdoc
documentation for src/lib.rs
for the main description of Justly
and information about using it.
Also see the meta/
directory
to learn more about the structure of this repo.
lib.rs
:
Collections that don’t need extra bounds-checking.
Overview
Justly implements “justified” wrappers
for the standard Rust data structures
in std::collections
.
A justified container is one whose keys,
such as the usize
indices into a [Vec
],
keep track of their validity.
This has a few major consequences, listed below.
Trustworthiness
Justly is built on the idea of trustworthy knowledge:
call::Call
represents a value called by a name,
which lets you encode knowledge about that named thing,
such as key::Key
, an index that you know is valid.
If you uphold the safety properties described in these APIs,
then such knowledge is called trustworthy,
and so you can rely on it to build safe abstractions.
Now unchecked indexing can be well and truly safe!
No extra checks
A “checked” method is one that performs bounds checking, and an “unchecked” method is one that unsafely omits it. Methods in Justly that safely omit bounds checks are called check-free. The Rust compiler attempts to elide checks when it can prove that they will always succeed. Justly implements check-free APIs by giving you the tools to prove that checks can be skipped.
It’s worth noting that some checks are strictly necessary! For example, you must do a runtime test to tell whether an arbitrary integer index is a valid key. Justly only lets you avoid extra checks, which are not strictly necessary, yet the compiler cannot prove it. But once you have gotten a typed key, it proves that the test has already been done, and doesn’t need to be repeated.
Also, most indices that you get from the collection APIs are already known to be valid keys, so Justly merely adds that information in its wrapper APIs.
No invalid indices
When referring to elements in a collection, Rust encourages using indices, relative to the collection, instead of absolute pointers. When a container is mutable, mutating methods can reallocate the container’s storage if its capacity is exceeded, thereby invalidating all pointers to that storage. So relative indexing makes it easier to uphold memory safety compared to absolute addressing, because indices automatically respond to capacity changes.
However, relative indices still don’t automatically handle changes in size: any method that decreases the size of the container must naturally invalidate any indices referring to elements outside of its new bounds.
Worse, since indices are just bare integers,
they come with no guarantees about the relationship
between the index and the value at that index.
For example, suppose we have a vector, v
,
and we compute some indices into it, i
and j
.
let mut v = vec![1, 7, 16, 27, 40, 55];
let i = v.iter().enumerate().position(|(k, &v)| k * 10 == v).unwrap();
let j = v.iter().enumerate().position(|(k, &v)| k * 11 == v).unwrap();
println!("{}", v[i]); // valid (40)
println!("{}", v[j]); // valid (55)
If we remove()
an element,
then this can easily break code
that makes assumptions about the indices:
remove()
not only makes j
inaccessible,
but also silently changes the meaning of i
.
This is like a use-after-free error.
v.remove(2);
println!("{}", v[i]); // logic error (55)
println!("{}", v[j]); // runtime error (panic)
And of course if we push()
an element,
the problem is compounded:
it silently makes j
accessible again,
but with a different meaning as well.
v.push(68);
println!("{}", v[j]); // logic error (68)
So Justly wraps those mutating methods that could invalidate the keys of a collection in consuming methods that give back a modified object, along with information about how the new keys are related to the old ones.
We do this with phantom type parameters, so there is no runtime cost.
About this documentation
Wrapper types and methods are annotated with tags
that describe their purpose, which are marked
with curly braces {…}
for easier searching.
{free}
Marks methods that used to require bounds-checking on the original type, but are now check-free.
{just}
On a container type, this says that the type is a wrapper for a plain container where some methods use typestate to add safety.
On a method, it marks places where we change the signature, typically either:
-
Taking ownership of
self
to replace amut
method on the wrapped type with one that tracks knowledge about the container -
Adding wrappers such as
key::Key
to record some knowledge about the result
{like}
Links to the original wrapped thing.
{safe}
Marks things that used to be unsafe
(usually unchecked)
but are now safe (implying check-freedom).
TODO
Allow naming unsized things
call::Call
and link::Link
ask for T: Sized
, but they don’t really need to,
since all the other fields are meant to be phantoms.
Allocator support
To support allocators,
there should be a type parameter on the wrappers,
such as A = std::alloc::Global
.
Implicit dereferencing
We’re uneasy about whether to add implementations
of Deref
and DerefMut
that would let a value be implicitly called by a name,
like these for Vec
:
impl<'name, T> Deref for Vec<'name, T> {
type Target = Call<'name, vec::Vec<T>>;
fn deref(&self) -> &Call<'name, vec::Vec<T>> {
&self.same
}
}
impl<'name, T> DerefMut for Vec<'name, T> {
fn deref_mut(
&mut self,
) -> &mut Call<'name, vec::Vec<T>> {
&mut self.same
}
}
Tracking capacity
At the time of this writing,
these APIs only track the validity of indices.
This means that if std::collections
has a mutating method
that doesn’t change any indices,
we can just forward it and keep the same API,
which is likely better for usability.
However, in principle
we could just as well track the capacity too.
This would reflect
the std::collections
guarantees about allocation
at the type level,
making it possible in some circumstances
to statically guarantee that code will not allocate.
But it would come at the ergonomic cost
of using a different API
for functions that might change the capacity.
Static, type-level names for runtime values.
Names are represented using lifetime variables.
For more on why, see [mod@crate::call
].
Helper for unique names.
name ≡ value
relationships:
define a unique static name for a dynamic value.
To make a new name, we have to make up a new type-level constant. which is both statically known and unique.
Often a “type constant” wants to be an existential type.
However, dyn Trait
types aren’t statically known,
and impl Trait
types aren’t necessarily unique.
Happily, we can write rank-1 existential quantification
using rank-2 universal quantification,
which is allowed for lifetime variables.
So we can write the name as a lifetime. That also lets us skip writing it sometimes, thanks to lifetime elision.
Gives names to values.
name × value
relationships:
pair a static name with a dynamic value.
Common properties of collections.
{just}
{like}:[mod@std::vec
]
{like}:[mod@std::slice
]
Vec<'name, T>
is a thin wrapper
around a named vector, Call<'name, std::vec::Vec<T>>
,
which you can make
using the crate::called::Called
trait.
As is typical for this library,
the Vec
wrapper replaces some methods
with versions that track the state of the container
to prove that unchecked indexing is safe,
and offer some additional features enabled by this.
The types Slice
and MutSlice
are similar thin wrappers for slices, and likewise
ConstPtr
and MutPtr
for pointers.
{like}:std::collections::vec_deque
{like}:std::collections::linked_list
{like}:std::collections::hash_map
{like}:std::collections::btree_map
{like}:std::collections::binary_heap
Indices known to be valid keys in collections.
Static proofs.