37 releases

new 0.13.8 Nov 7, 2024
0.13.7 Jun 14, 2024
0.13.6 May 30, 2024
0.13.4 Jan 4, 2024
0.1.1 Dec 18, 2018

#17 in Parser tooling

Download history 3422/week @ 2024-07-18 3213/week @ 2024-07-25 3103/week @ 2024-08-01 2953/week @ 2024-08-08 2693/week @ 2024-08-15 3546/week @ 2024-08-22 3553/week @ 2024-08-29 2628/week @ 2024-09-05 2811/week @ 2024-09-12 2763/week @ 2024-09-19 3724/week @ 2024-09-26 3236/week @ 2024-10-03 3584/week @ 2024-10-10 3900/week @ 2024-10-17 4323/week @ 2024-10-24 3316/week @ 2024-10-31

15,780 downloads per month
Used in 13 crates (11 directly)

Apache-2.0/MIT

210KB
5K SLoC

cfgrammar

cfgrammar reads in grammar files, processes them, and provides a convenient API for operating with them. It may be of interest to those manipulating grammars directly, or who wish to use custom types of parsers.


lib.rs:

A library for manipulating Context Free Grammars (CFG). It is impractical to fully homogenise all the types of grammars out there, so the aim is for different grammar types to have completely separate implementations. Code that wants to be generic over more than one grammar type can then use an "adapter" to homogenise the particular grammar types of interest. Currently this is a little academic, since only Yacc-style grammars are supported (albeit several variants of Yacc grammars).

Unfortunately, CFG terminology is something of a mess. Some people use different terms for the same concept interchangeably; some use different terms to convey subtle differences of meaning (but without complete uniformity). "Token", "terminal", and "lexeme" are examples of this: they are synonyms in some tools and papers, but not in others.

In order to make this library somewhat coherent, we therefore use some basic terminology guidelines for major concepts (acknowledging that this will cause clashes with some grammar types).

  • A grammar is an ordered sequence of productions.
  • A production is an ordered sequence of symbols.
  • A rule maps a name to one or more productions.
  • A token is the name of a syntactic element.

For example, in the following Yacc grammar:

R1: "a" "b" | R2; R2: "c";

the following statements are true:

  • There are 3 productions. 1: ["a", "b"] 2: ["R2"] 3: ["c"]`
  • There are two rules: R1 and R2. The mapping to productions is {R1: {1, 2}, R2: {3}}
  • There are three tokens: a, b, and c.

cfgrammar makes the following guarantees about grammars:

  • Productions are numbered from 0 to prods_len() - 1 (inclusive).
  • Rules are numbered from 0 to rules_len() - 1 (inclusive).
  • Tokens are numbered from 0 to toks_len() - 1 (inclusive).
  • The StorageT type used to store productions, rules, and token indices can be infallibly converted into usize (see TIdx and friends for more details).

For most current uses, the main function to investigate is YaccGrammar::new() and/or YaccGrammar::new_with_storaget() which take as input a Yacc grammar.

Dependencies

~3.5–5MB
~93K SLoC