1 unstable release
0.1.0 | Feb 20, 2020 |
---|
#313 in Parser tooling
31KB
645 lines
denuocc grammar_tool
This tool aims to help in the development of predictive recursive descent parser.
There are four main subcommand for grammar_tool
:
-
print
shows and numbers every production in the grammar$ cargo run -- print ./grammars/aho_ullman/example_5.3.yacc start: S terminals: a b 0 S : ; 1 S : a b A ; 2 A : S a a ; 3 A : b ;
It shows the
start
nonterminal, which happens to beS
. It shows that the terminals in this grammar area
andb
. It then shows the 4 productions in this grammar. -
first
shows the first set for every nonterminal in the grammar$ cargo run -- first -k2 ./grammars/aho_ullman/example_5.3.yacc S : S : a b A : a a A : a b A : b
This means that the production
S
can legally start with an empty string or the token string"ab"
. Similarly, the nonterminalA
may begin with the token strings"aa"
,"ab"
, or just"b"
. Notably,A
cannot be an empty string. -
follow
shows the follow set for every nonterminal in a grammar$ cargo run -- follow -k2 ./grammars/aho_ullman/example_5.3.yacc S : S : a a A :
The
S :
line with nothing after the colon means that is valid for the input string to end after parsing anS
. The same goes for the lineA :
. The lineS : a a
means that the token string"aa"
may sometimes legally follow after parsing anS
. -
test
will verify if a grammar ifLL(k)
or explain why it is not$ cargo run -- test -k1 ./grammars/aho_ullman/example_5.3.yacc grammar is not LL(1) $ cargo run -- test -k1 --explain ./grammars/aho_ullman/example_5.3.yacc productions [0, 1] cause LL-conflicts: [["a"]] production 0 S : ; production 1 S : a b A; conflicting suffix: ["a"] grammar is not LL(1) $ cargo run -- test -k2 ./grammars/aho_ullman/example_5.3.yacc grammar is strong LL(2)
As the
follow
command will show, anS
production may be followed by the sequencea a
(which occurs in production #2A : S a a
). Thus, it is impossible to tell with only 1 token lookahead which of the twoS
productions to choose.
Syntax
grammar_tool
accepts a very simple grammar format similar to YACC. The input is
split into two parts: the header and the body, separated by a line of just %%
.
The header may contain a %token
or %start
line. The %token
line declares
identifiers that are terminals. The %start
line specifies which nonterminal is
the root of the grammar.
The body contains the definitions of every nonterminal. A definition may provide
multiple productions using the |
character. Any identifier referenced in a
production, if not declared by a %token
header, is assumed to be a
nonterminal. A quoted string in a production is also a terminal.
%token TERMINAL
%start S
%%
S : variant1
| variant2
;
variant1 : TERMINAL TERMINAL;
variant2 : "terminal" ;
License
This project is dual licensed under the terms of the MIT license and the Apache License Version 2.0 at your option. See ./LICENSE-MIT and ./LICENSE-APACHE for details.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Dependencies
~3.5–5.5MB
~96K SLoC