#binary-data #byte-slice #binary-format #binary-parser #structs #data-access #byte-buffer

bytes-cast

Safely re-interpreting &[u8] bytes as custom structs without copying, for efficiently reading structured binary data

3 releases (breaking)

0.3.0 Jan 10, 2023
0.2.0 Apr 8, 2021
0.1.0 Jan 15, 2021

#2203 in Parser implementations

Download history 258/week @ 2024-07-22 337/week @ 2024-07-29 450/week @ 2024-08-05 389/week @ 2024-08-12 310/week @ 2024-08-19 290/week @ 2024-08-26 150/week @ 2024-09-02 498/week @ 2024-09-09 190/week @ 2024-09-16 398/week @ 2024-09-23 388/week @ 2024-09-30 517/week @ 2024-10-07 346/week @ 2024-10-14 687/week @ 2024-10-21 337/week @ 2024-10-28 635/week @ 2024-11-04

2,070 downloads per month

Zlib OR Apache-2.0 OR MIT

16KB
161 lines

bytes-cast

Safely re-interpreting &[u8] bytes as custom structs without copying, for efficiently reading structured binary data.

crates.io docs.rs

Credits

This crate contains code derived from https://github.com/Lokathor/bytemuck.

Problem statement

When reading from disk a file in a given format, “traditional” parsing techniques such with the nom crate typically involve creating a different data structure in memory where allocation and copying can be costly.

For binary formats amenable to this it can be more efficient to have in memory a bytes buffer in the same format as on disk, possibly memory-mapped directly by the kernel, and only access parts of it as needed. But doing this entierly with manual index or pointer manipulation can be error-prone.

By defining structs whose memory layout matches the binary format then casting pointers to manipulate reference, arrays, or slices of those structs we can let the compiler do most of the offset computations and have much more readable code.

Issues and checking

  • Some Rust types have validity constraints and must not be cast from arbitrary bytes. For example creating a bool whose value in memory is not 0_u8 or 1_u8 is Undefined Behavior. Similarly for enums.

  • When align_of for a type is greater than one, accessing values of that type at addresses not a multiple of alignment is Undefined Behavior. Alignment can also cause struct to have padding, making field offsets not what we might expect. Instead, we can make helper types that wrap for example [u8; 4] and convert to/from u32.

  • Binary formats for storage or transmission typically mandate one of little-endian or big-endian. Helper types again can take care of conversion to and from the CPU’s native endianness.

  • By default the Rust compiler can choose reorder struct fields (in order to reduce padding). This again can make field offsets not what we’d expect. This can be disabled by marking a struct with #[repr(C)] or #[repr(transparent)].

This crate combines Rust’s check for all of the above at compile-time. The the documentation for API details.

Why another crate

bytemuck and other projects already exist with very similar goals. This crate make some different design choices and is opinionated in some ways:

  • It only converts from &[u8] bytes and does not try to be more general or accomodate many use cases.

  • Providing more bytes than necessary is not an error. Instead the start of the slice is re-interpreted, and the remaining bytes are part of the return value for further processing. (The caller can check or assert remaining.is_empty() if an exact length is desired.)

  • It mandates align_of() == 1 at compile-time instead of checking pointer alignment at runtime, removing one category of panics or errors that needs to be handled. Not enough bytes is the only error case. Fields with align_of() == 1 also removes any padding in structs.

Dependencies

~1.5MB
~37K SLoC