www.delorie.com/gnu/docs/bison/bison_11.html | search |
![]() Buy the book! | |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In some grammars, there will be cases where Bison's standard LALR(1) parsing algorithm cannot decide whether to apply a certain grammar rule at a given point. That is, it may not be able to decide (on the basis of the input read so far) which of two possible reductions (applications of a grammar rule) applies, or whether to apply a reduction or read more of the input and apply a reduction later in the input. These are known respectively as reduce/reduce conflicts (see section 5.6 Reduce/Reduce Conflicts), and shift/reduce conflicts (see section 5.2 Shift/Reduce Conflicts).
To use a grammar that is not easily modified to be LALR(1), a
more general parsing algorithm is sometimes necessary. If you include
%glr-parser
among the Bison declarations in your file
(see section 3.1 Outline of a Bison Grammar), the result will be a Generalized LR
(GLR) parser. These parsers handle Bison grammars that
contain no unresolved conflicts (i.e., after applying precedence
declarations) identically to LALR(1) parsers. However, when
faced with unresolved shift/reduce and reduce/reduce conflicts,
GLR parsers use the simple expedient of doing both,
effectively cloning the parser to follow both possibilities. Each of
the resulting parsers can again split, so that at any given time, there
can be any number of possible parses being explored. The parsers
proceed in lockstep; that is, all of them consume (shift) a given input
symbol before any of them proceed to the next. Each of the cloned
parsers eventually meets one of two possible fates: either it runs into
a parsing error, in which case it simply vanishes, or it merges with
another parser, because the two of them have reduced the input to an
identical set of symbols.
During the time that there are multiple parsers, semantic actions are recorded, but not performed. When a parser disappears, its recorded semantic actions disappear as well, and are never performed. When a reduction makes two parsers identical, causing them to merge, Bison records both sets of semantic actions. Whenever the last two parsers merge, reverting to the single-parser case, Bison resolves all the outstanding actions either by precedences given to the grammar rules involved, or by performing both actions, and then calling a designated user-defined function on the resulting values to produce an arbitrary merged result.
Let's consider an example, vastly simplified from a C++ grammar.
%{ #include <stdio.h> #define YYSTYPE char const * int yylex (void); void yyerror (char const *); %} %token TYPENAME ID %right '=' %left '+' %glr-parser %% prog : | prog stmt { printf ("\n"); } ; stmt : expr ';' %dprec 1 | decl %dprec 2 ; expr : ID { printf ("%s ", $$); } | TYPENAME '(' expr ')' { printf ("%s |
This models a problematic part of the C++ grammar--the ambiguity between certain declarations and statements. For example,
T (x) = y+z; |
parses as either an expr
or a stmt
(assuming that `T' is recognized as a TYPENAME
and
`x' as an ID
).
Bison detects this as a reduce/reduce conflict between the rules
expr : ID
and declarator : ID
, which it cannot resolve at the
time it encounters x
in the example above. The two %dprec
declarations, however, give precedence to interpreting the example as a
decl
, which implies that x
is a declarator.
The parser therefore prints
"x" y z + T <init-declare> |
Consider a different input string for this parser:
T (x) + y; |
Here, there is no ambiguity (this cannot be parsed as a declaration).
However, at the time the Bison parser encounters x
, it does not
have enough information to resolve the reduce/reduce conflict (again,
between x
as an expr
or a declarator
). In this
case, no precedence declaration is used. Instead, the parser splits
into two, one assuming that x
is an expr
, and the other
assuming x
is a declarator
. The second of these parsers
then vanishes when it sees +
, and the parser prints
x T <cast> y + |
Suppose that instead of resolving the ambiguity, you wanted to see all
the possibilities. For this purpose, we must merge the semantic
actions of the two possible parsers, rather than choosing one over the
other. To do so, you could change the declaration of stmt
as
follows:
stmt : expr ';' %merge <stmtMerge> | decl %merge <stmtMerge> ; |
and define the stmtMerge
function as:
static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1) { printf ("<OR> "); return ""; } |
with an accompanying forward declaration in the C declarations at the beginning of the file:
%{ #define YYSTYPE char const * static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1); %} |
With these declarations, the resulting parser will parse the first example
as both an expr
and a decl
, and print
"x" y z + T <init-declare> x T <cast> y z + = <OR> |
The GLR parsers require a compiler for ISO C89 or
later. In addition, they use the inline
keyword, which is not
C89, but is C99 and is a common extension in pre-C99 compilers. It is
up to the user of these parsers to handle
portability issues. For instance, if using Autoconf and the Autoconf
macro AC_C_INLINE
, a mere
%{ #include <config.h> %} |
will suffice. Otherwise, we suggest
%{ #if __STDC_VERSION__ < 199901 && ! defined __GNUC__ && ! defined inline #define inline #endif %} |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
webmaster | delorie software privacy |
Copyright © 2003 by The Free Software Foundation | Updated Jun 2003 |