www.delorie.com/gnu/docs/bison/bison_98.html | search |
Buy the book! | |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As documented elsewhere (see section The Bison Parser Algorithm) Bison parsers are shift/reduce automata. In some cases (much more frequent than one would hope), looking at this automaton is required to tune or simply fix a parser. Bison provides two different representation of it, either textually or graphically (as a VCG file).
The textual file is generated when the options `--report' or `--verbose' are specified, see See section Invoking Bison. Its name is made by removing `.tab.c' or `.c' from the parser output file name, and adding `.output' instead. Therefore, if the input file is `foo.y', then the parser file is called `foo.tab.c' by default. As a consequence, the verbose output file is called `foo.output'.
The following grammar file, `calc.y', will be used in the sequel:
%token NUM STR %left '+' '-' %left '*' %% exp: exp '+' exp | exp '-' exp | exp '*' exp | exp '/' exp | NUM ; useless: STR; %% |
bison
reports:
calc.y: warning: 1 useless nonterminal and 1 useless rule calc.y:11.1-7: warning: useless nonterminal: useless calc.y:11.10-12: warning: useless rule: useless: STR calc.y: conflicts: 7 shift/reduce |
When given `--report=state', in addition to `calc.tab.c', it creates a file `calc.output' with contents detailed below. The order of the output and the exact presentation might vary, but the interpretation is the same.
The first section includes details on conflicts that were solved thanks to precedence and/or associativity:
Conflict in state 8 between rule 2 and token '+' resolved as reduce. Conflict in state 8 between rule 2 and token '-' resolved as reduce. Conflict in state 8 between rule 2 and token '*' resolved as shift. ... |
The next section lists states that still have conflicts.
State 8 conflicts: 1 shift/reduce State 9 conflicts: 1 shift/reduce State 10 conflicts: 1 shift/reduce State 11 conflicts: 4 shift/reduce |
The next section reports useless tokens, nonterminal and rules. Useless nonterminals and rules are removed in order to produce a smaller parser, but useless tokens are preserved, since they might be used by the scanner (note the difference between "useless" and "not used" below):
Useless nonterminals: useless Terminals which are not used: STR Useless rules: #6 useless: STR; |
The next section reproduces the exact grammar that Bison used:
Grammar Number, Line, Rule 0 5 $accept -> exp $end 1 5 exp -> exp '+' exp 2 6 exp -> exp '-' exp 3 7 exp -> exp '*' exp 4 8 exp -> exp '/' exp 5 9 exp -> NUM |
and reports the uses of the symbols:
Terminals, with rules where they appear $end (0) 0 '*' (42) 3 '+' (43) 1 '-' (45) 2 '/' (47) 4 error (256) NUM (258) 5 Nonterminals, with rules where they appear $accept (8) on left: 0 exp (9) on left: 1 2 3 4 5, on right: 0 1 2 3 4 |
Bison then proceeds onto the automaton itself, describing each state with it set of items, also known as pointed rules. Each item is a production rule together with a point (marked by `.') that the input cursor.
state 0 $accept -> . exp $ (rule 0) NUM shift, and go to state 1 exp go to state 2 |
This reads as follows: "state 0 corresponds to being at the very
beginning of the parsing, in the initial rule, right before the start
symbol (here, exp
). When the parser returns to this state right
after having reduced a rule that produced an exp
, the control
flow jumps to state 2. If there is no such transition on a nonterminal
symbol, and the lookahead is a NUM
, then this token is shifted on
the parse stack, and the control flow jumps to state 1. Any other
lookahead triggers a syntax error."
Even though the only active rule in state 0 seems to be rule 0, the
report lists NUM
as a lookahead symbol because NUM
can be
at the beginning of any rule deriving an exp
. By default Bison
reports the so-called core or kernel of the item set, but if
you want to see more detail you can invoke bison
with
`--report=itemset' to list all the items, include those that can
be derived:
state 0 $accept -> . exp $ (rule 0) exp -> . exp '+' exp (rule 1) exp -> . exp '-' exp (rule 2) exp -> . exp '*' exp (rule 3) exp -> . exp '/' exp (rule 4) exp -> . NUM (rule 5) NUM shift, and go to state 1 exp go to state 2 |
In the state 1...
state 1 exp -> NUM . (rule 5) $default reduce using rule 5 (exp) |
the rule 5, `exp: NUM;', is completed. Whatever the lookahead (`$default'), the parser will reduce it. If it was coming from state 0, then, after this reduction it will return to state 0, and will jump to state 2 (`exp: go to state 2').
state 2 $accept -> exp . $ (rule 0) exp -> exp . '+' exp (rule 1) exp -> exp . '-' exp (rule 2) exp -> exp . '*' exp (rule 3) exp -> exp . '/' exp (rule 4) $ shift, and go to state 3 '+' shift, and go to state 4 '-' shift, and go to state 5 '*' shift, and go to state 6 '/' shift, and go to state 7 |
In state 2, the automaton can only shift a symbol. For instance, because of the item `exp -> exp . '+' exp', if the lookahead if `+', it will be shifted on the parse stack, and the automaton control will jump to state 4, corresponding to the item `exp -> exp '+' . exp'. Since there is no default action, any other token than those listed above will trigger a syntax error.
The state 3 is named the final state, or the accepting state:
state 3 $accept -> exp $ . (rule 0) $default accept |
the initial rule is completed (the start symbol and the end of input were read), the parsing exits successfully.
The interpretation of states 4 to 7 is straightforward, and is left to the reader.
state 4 exp -> exp '+' . exp (rule 1) NUM shift, and go to state 1 exp go to state 8 state 5 exp -> exp '-' . exp (rule 2) NUM shift, and go to state 1 exp go to state 9 state 6 exp -> exp '*' . exp (rule 3) NUM shift, and go to state 1 exp go to state 10 state 7 exp -> exp '/' . exp (rule 4) NUM shift, and go to state 1 exp go to state 11 |
As was announced in beginning of the report, `State 8 conflicts: 1 shift/reduce':
state 8 exp -> exp . '+' exp (rule 1) exp -> exp '+' exp . (rule 1) exp -> exp . '-' exp (rule 2) exp -> exp . '*' exp (rule 3) exp -> exp . '/' exp (rule 4) '*' shift, and go to state 6 '/' shift, and go to state 7 '/' [reduce using rule 1 (exp)] $default reduce using rule 1 (exp) |
Indeed, there are two actions associated to the lookahead `/': either shifting (and going to state 7), or reducing rule 1. The conflict means that either the grammar is ambiguous, or the parser lacks information to make the right decision. Indeed the grammar is ambiguous, as, since we did not specify the precedence of `/', the sentence `NUM + NUM / NUM' can be parsed as `NUM + (NUM / NUM)', which corresponds to shifting `/', or as `(NUM + NUM) / NUM', which corresponds to reducing rule 1.
Because in LALR(1) parsing a single decision can be made, Bison arbitrarily chose to disable the reduction, see Shift/Reduce Conflicts. Discarded actions are reported in between square brackets.
Note that all the previous states had a single possible action: either shifting the next token and going to the corresponding state, or reducing a single rule. In the other cases, i.e., when shifting and reducing is possible or when several reductions are possible, the lookahead is required to select the action. State 8 is one such state: if the lookahead is `*' or `/' then the action is shifting, otherwise the action is reducing rule 1. In other words, the first two items, corresponding to rule 1, are not eligible when the lookahead is `*', since we specified that `*' has higher precedence that `+'. More generally, some items are eligible only with some set of possible lookaheads. When run with `--report=lookahead', Bison specifies these lookaheads:
state 8 exp -> exp . '+' exp [$, '+', '-', '/'] (rule 1) exp -> exp '+' exp . [$, '+', '-', '/'] (rule 1) exp -> exp . '-' exp (rule 2) exp -> exp . '*' exp (rule 3) exp -> exp . '/' exp (rule 4) '*' shift, and go to state 6 '/' shift, and go to state 7 '/' [reduce using rule 1 (exp)] $default reduce using rule 1 (exp) |
The remaining states are similar:
state 9 exp -> exp . '+' exp (rule 1) exp -> exp . '-' exp (rule 2) exp -> exp '-' exp . (rule 2) exp -> exp . '*' exp (rule 3) exp -> exp . '/' exp (rule 4) '*' shift, and go to state 6 '/' shift, and go to state 7 '/' [reduce using rule 2 (exp)] $default reduce using rule 2 (exp) state 10 exp -> exp . '+' exp (rule 1) exp -> exp . '-' exp (rule 2) exp -> exp . '*' exp (rule 3) exp -> exp '*' exp . (rule 3) exp -> exp . '/' exp (rule 4) '/' shift, and go to state 7 '/' [reduce using rule 3 (exp)] $default reduce using rule 3 (exp) state 11 exp -> exp . '+' exp (rule 1) exp -> exp . '-' exp (rule 2) exp -> exp . '*' exp (rule 3) exp -> exp . '/' exp (rule 4) exp -> exp '/' exp . (rule 4) '+' shift, and go to state 4 '-' shift, and go to state 5 '*' shift, and go to state 6 '/' shift, and go to state 7 '+' [reduce using rule 4 (exp)] '-' [reduce using rule 4 (exp)] '*' [reduce using rule 4 (exp)] '/' [reduce using rule 4 (exp)] $default reduce using rule 4 (exp) |
Observe that state 11 contains conflicts due to the lack of precedence of `/' wrt `+', `-', and `*', but also because the associativity of `/' is not specified.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
webmaster donations bookstore | delorie software privacy |
Copyright © 2003 by The Free Software Foundation | Updated Jun 2003 |