#compiler #glr #lr #bison #proc-macro #parser #yacc

rusty_lr_buildscript

buildscipt tools for rusty_lr

27 releases (breaking)

0.22.1 Oct 26, 2024
0.21.1 Oct 9, 2024

#84 in Parser tooling

Download history 249/week @ 2024-08-12 560/week @ 2024-08-19 1408/week @ 2024-08-26 1078/week @ 2024-09-02 17/week @ 2024-09-09 66/week @ 2024-09-16 9/week @ 2024-09-23 40/week @ 2024-09-30 196/week @ 2024-10-07 24/week @ 2024-10-14 324/week @ 2024-10-21 35/week @ 2024-10-28 6/week @ 2024-11-04

404 downloads per month
Used in 5 crates (2 directly)

MIT/Apache

445KB
10K SLoC

rusty_lr

crates.io docs.rs

GLR, LR(1) and LALR(1) parser generator for Rust.

Please refer to docs.rs for detailed example and documentation.

Cargo Features

  • build : Enable buildscript tools.
  • fxhash : In parser table, replace std::collections::HashMap with FxHashMap from rustc-hash.
  • tree : Enable automatic syntax tree construction. This feature should be used on debug purpose only, since it will consume much more memory and time.
  • error : Enable detailed parsing error messages, for Display and Debug trait. This feature should be used on debug purpose only, since it will consume much more memory and time.

Features

  • GLR, LR(1) and LALR(1) parser generator
  • Multiple paths of parsing with GLR parser
  • Provides procedural macros and buildscript tools
  • readable error messages, both for parsing and building grammar
  • customizable reduce action
  • resolving conflicts of ambiguous grammar
  • regex patterns partially supported

Example

// this define `EParser` struct
// where `E` is the start symbol
lr1! {
    %userdata i32;           // userdata type
    %tokentype char;         // token type
    %start E;                // start symbol
    %eof '\0';               // eof token

    // token definition
    %token zero '0';
    %token one '1';
    %token two '2';
    %token three '3';
    %token four '4';
    %token five '5';
    %token six '6';
    %token seven '7';
    %token eight '8';
    %token nine '9';
    %token plus '+';
    %token star '*';
    %token lparen '(';
    %token rparen ')';
    %token space ' ';

    // conflict resolving
    %left [plus star];                  // reduce first for token 'plus', 'star'

    // context-free grammars
    Digit(char): [zero-nine];           // character set '0' to '9'

    Number(i32)                         // type assigned to production rule `Number`
        : space* Digit+ space*          // regex pattern
        { Digit.into_iter().collect::<String>().parse().unwrap() }; // this will be the value of `Number`
                                                                // reduce action written in Rust code

    A(f32): A plus a2=A {
        *data += 1;                     // access userdata by `data`
        println!( "{:?} {:?} {:?}", A, plus, a2 );
        A + a2   // this will be the value of `A`
    }
        | M
        ;

    M(f32): M star m2=M { M * m2 }
        | P
        ;

    P(f32): Number { Number as f32 }
        | space* lparen E rparen space* { E }
        ;

    E(f32) : A ;
}
let parser = EParser::new();         // generate `EParser`
let mut context = EContext::new();   // create context
let mut userdata: i32 = 0;           // define userdata

let input_sequence = "1 + 2 * ( 3 + 4 )";

// start feeding tokens
for token in input_sequence.chars() {
    match context.feed(&parser, token, &mut userdata) {
        //                      ^^^^^   ^^^^^^^^^^^^ userdata passed here as `&mut i32`
        //                     feed token
        Ok(_) => {}
        Err(e) => {
            match e {
                EParseError::InvalidTerminal(invalid_terminal) => {
                    ...
                }
                EParseError::ReduceAction(error_from_reduce_action) => {
                    ...
                }
            }
            println!("{}", e);
            // println!( "{}", e.long_message( &parser, &context ) );
            return;
        }
    }
}
context.feed(&parser, '\0', &mut userdata).unwrap();    // feed `eof` token

let res = context.accept();   // get the value of start symbol
println!("{}", res);
println!("userdata: {}", userdata);

Readable error messages (with codespan)

images/error1.png images/error2.png

  • This error message is generated by the buildscript tool, not the procedural macros.

Visualized syntax tree

images/tree.png

  • With tree feature enabled.

detailed ParseError message

images/parse_error.png

  • With error feature enabled.

Syntax

See SYNTAX.md for details of grammar-definition syntax.

  • Bootstrap: rusty_lr syntax is written in rusty_lr itself.

Contribution

  • Any contribution is welcome.
  • Please feel free to open an issue or pull request.

License (Since 2.8.0)

Either of

Other Examples

Dependencies

~1.3–8MB
~60K SLoC