3 releases (breaking)
0.3.0 | Jan 10, 2023 |
---|---|
0.2.0 | Apr 8, 2021 |
0.1.0 | Jan 15, 2021 |
#2203 in Parser implementations
2,070 downloads per month
16KB
161 lines
bytes-cast
Safely re-interpreting &[u8]
bytes as custom structs without copying,
for efficiently reading structured binary data.
Credits
This crate contains code derived from https://github.com/Lokathor/bytemuck.
Problem statement
When reading from disk a file in a given format, “traditional” parsing techniques such
with the nom
crate typically involve creating a different data structure in memory
where allocation and copying can be costly.
For binary formats amenable to this it can be more efficient to have in memory a bytes buffer in the same format as on disk, possibly memory-mapped directly by the kernel, and only access parts of it as needed. But doing this entierly with manual index or pointer manipulation can be error-prone.
By defining struct
s whose memory layout matches the binary format
then casting pointers to manipulate reference, arrays, or slices of those structs
we can let the compiler do most of the offset computations and have much more readable code.
Issues and checking
-
Some Rust types have validity constraints and must not be cast from arbitrary bytes. For example creating a
bool
whose value in memory is not0_u8
or1_u8
is Undefined Behavior. Similarly forenum
s. -
When
align_of
for a type is greater than one, accessing values of that type at addresses not a multiple of alignment is Undefined Behavior. Alignment can also cause struct to have padding, making field offsets not what we might expect. Instead, we can make helper types that wrap for example[u8; 4]
and convert to/fromu32
. -
Binary formats for storage or transmission typically mandate one of little-endian or big-endian. Helper types again can take care of conversion to and from the CPU’s native endianness.
-
By default the Rust compiler can choose reorder struct fields (in order to reduce padding). This again can make field offsets not what we’d expect. This can be disabled by marking a struct with
#[repr(C)]
or#[repr(transparent)]
.
This crate combines Rust’s check for all of the above at compile-time. The the documentation for API details.
Why another crate
bytemuck
and other projects already exist with very similar goals.
This crate make some different design choices and is opinionated in some ways:
-
It only converts from
&[u8]
bytes and does not try to be more general or accomodate many use cases. -
Providing more bytes than necessary is not an error. Instead the start of the slice is re-interpreted, and the remaining bytes are part of the return value for further processing. (The caller can check or assert
remaining.is_empty()
if an exact length is desired.) -
It mandates
align_of() == 1
at compile-time instead of checking pointer alignment at runtime, removing one category of panics or errors that needs to be handled. Not enough bytes is the only error case. Fields withalign_of() == 1
also removes any padding in structs.
Dependencies
~1.5MB
~37K SLoC