5 releases

0.0.5 Dec 28, 2024
0.0.4 Aug 5, 2024
0.0.3 Jun 9, 2024
0.0.2 Jun 3, 2024
0.0.1 May 28, 2024

#537 in Rust patterns

Download history 9/week @ 2024-09-25 6/week @ 2024-10-02 1/week @ 2024-12-04 4/week @ 2024-12-11 106/week @ 2024-12-25 6/week @ 2025-01-08

112 downloads per month

MIT license

230KB
6K SLoC

teleparse

working in progress - Proc-macro powered LL(1) parsing library

This library is comparable to serde for parsing - All you need is define the syntax as data types and call parse() on the root type.

Features:

  • Syntax tree defined by macro attributes on structs and enums - no separate grammar file
  • Proc-macro powered - no separate build step to generate parser code
  • Provide a #[test] to ensure the grammar is LL(1), or fail at runtime
  • Utils for parsing components into primitives like tuples, options, and delimited lists

Credits:

  • The lexer implementation is backed by the ridiculously fast logos library
  • The "Dragon Book" Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman:

Progress:

  • Lexer/Tokens
    • Macro for terminals
  • Parser
    • LL(1) stuff
    • Macros
    • Semantic Tokens (token type applied later by the parser)
      • Tests
      • Documentation
    • Tests
    • Documentation
    • Hooks
  • Utillity types tp
  • Static Metadata
    • Bench
    • Test
    • Documentation
  • mdBook
    • Chapters
      • derive_lexicon
      • derive_syntax
      • using tp
      • semantic tokens
      • hooks (1.1)
      • using parser data
    • second iteration to add links
  • Usability testing
  • crate documentation linking to the book

Traditionally recursive grammar can also be simplified with built-in syntax types.

// with recursion
E  => T E'
E' => + T E' | ε
T  => F T'
T' => * F T' | ε
F  => ( E ) | id

// simplified
E  => T ( + T )*
T  => F ( * F )*
F  => ( E ) | id

Which can then be implemented as:

use teleparse::prelude::*;

#[derive_lexicon]
#[teleparse(ignore(r"\s+"))]
pub enum TokenType {
    #[teleparse(regex(r"\w+"), terminal(Ident))]
    Ident,
    #[teleparse(terminal(
        OpAdd = "+",
        OpMul = "*",
    ))]
    Op,
    /// Parentheses
    #[teleparse(terminal(
        ParenOpen = "(",
        ParenClose = ")"
    ))]
    Paren,
}

#[derive_syntax]
#[teleparse(root)]
struct E(tp::Split<T, OpAdd>); // E -> T ( + T )*
#[derive_syntax]
struct T(tp::Split<F, OpMul>); // T -> F ( * F )*
#[derive_syntax]
enum F {
    Ident(Ident),
    Paren((ParenOpen, Box<E>, ParenClose)),
}

fn main() -> Result<(), teleparse::GrammarError> {
    let source = "(a+b)*(c+d)";
    let _expr = E::parse(source)?;
    
    Ok(())
}

Dependencies

~8.5MB
~135K SLoC