#parser-combinator #byte #heap #framework #wrap #intuitive #repeat

whitehole

A simple, fast, intuitive parser combinator framework for Rust

6 releases (breaking)

new 0.5.0 Jan 28, 2025
0.4.0 Jan 9, 2025
0.3.0 Jan 5, 2025
0.2.0 Jan 1, 2025
0.0.1 Nov 24, 2024

#52 in Parser tooling

Download history 107/week @ 2024-11-20 18/week @ 2024-11-27 14/week @ 2024-12-04 2/week @ 2024-12-11 98/week @ 2024-12-25 294/week @ 2025-01-01 151/week @ 2025-01-08 3/week @ 2025-01-15

546 downloads per month

MIT license

190KB
4.5K SLoC

whitehole

license Crates.io Version docs.rs Codecov

A simple, fast, intuitive parser combinator framework for Rust.

Features

  • Simple: only a handful of combinators to remember: eat, take, next, till, wrap, recur.
  • Operator overloading: use + and | to compose combinators, use * to repeat a combinator.
  • Almost zero heap allocation: this framework only uses stack memory, except recur which uses some pointers for recursion.
  • Re-usable heap memory: store accumulated values in a parser-managed heap, instead of re-allocation for each iteration.
  • Stateful-able: control the parsing flow with an optional custom state.
  • Safe by default, with unsafe variants for performance.
  • Provide both string (&str) and bytes (&[u8]) support.

Installation

cargo add whitehole

Examples

Here is a simple example to parse hexadecimal color codes:

use whitehole::{
  combinator::{eat, next},
  parser::Parser,
};

let double_hex = || {
  // Repeat a combinator with `*`.
  (next(|c| c.is_ascii_hexdigit()) * 2)
    // Convert the matched content to `u8`.
    .select(|ctx| u8::from_str_radix(ctx.content(), 16).unwrap())
    // Wrap `u8` to `(u8,)`, this is required by `+` below.
    .tuple()
};

// Concat multiple combinators with `+`.
// Tuple values will be concatenated into a single tuple.
// Here `() + (u8,) + (u8,) + (u8,)` will be `(u8, u8, u8)`.
let entry = eat('#') + double_hex() + double_hex() + double_hex();

let mut parser = Parser::builder().entry(entry).build("#FFA500");
let output = parser.parse().unwrap();
assert_eq!(output.digested, 7);
assert_eq!(output.value, (255, 165, 0));

Documentation

  • in_str: a procedural macro to generate a closure that checks if a character is in the provided literal string.

Credits

This project is inspired by:

CHANGELOG

No runtime deps