www.delorie.com/gnu/docs/bison/bison_11.html   search  
 
Buy the book!


Bison 1.875

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.5 Writing GLR Parsers

In some grammars, there will be cases where Bison's standard LALR(1) parsing algorithm cannot decide whether to apply a certain grammar rule at a given point. That is, it may not be able to decide (on the basis of the input read so far) which of two possible reductions (applications of a grammar rule) applies, or whether to apply a reduction or read more of the input and apply a reduction later in the input. These are known respectively as reduce/reduce conflicts (see section 5.6 Reduce/Reduce Conflicts), and shift/reduce conflicts (see section 5.2 Shift/Reduce Conflicts).

To use a grammar that is not easily modified to be LALR(1), a more general parsing algorithm is sometimes necessary. If you include %glr-parser among the Bison declarations in your file (see section 3.1 Outline of a Bison Grammar), the result will be a Generalized LR (GLR) parser. These parsers handle Bison grammars that contain no unresolved conflicts (i.e., after applying precedence declarations) identically to LALR(1) parsers. However, when faced with unresolved shift/reduce and reduce/reduce conflicts, GLR parsers use the simple expedient of doing both, effectively cloning the parser to follow both possibilities. Each of the resulting parsers can again split, so that at any given time, there can be any number of possible parses being explored. The parsers proceed in lockstep; that is, all of them consume (shift) a given input symbol before any of them proceed to the next. Each of the cloned parsers eventually meets one of two possible fates: either it runs into a parsing error, in which case it simply vanishes, or it merges with another parser, because the two of them have reduced the input to an identical set of symbols.

During the time that there are multiple parsers, semantic actions are recorded, but not performed. When a parser disappears, its recorded semantic actions disappear as well, and are never performed. When a reduction makes two parsers identical, causing them to merge, Bison records both sets of semantic actions. Whenever the last two parsers merge, reverting to the single-parser case, Bison resolves all the outstanding actions either by precedences given to the grammar rules involved, or by performing both actions, and then calling a designated user-defined function on the resulting values to produce an arbitrary merged result.

Let's consider an example, vastly simplified from a C++ grammar.

 
%{
  #include <stdio.h>
  #define YYSTYPE char const *
  int yylex (void);
  void yyerror (char const *);
%}

%token TYPENAME ID

%right '='
%left '+'

%glr-parser

%%

prog :
     | prog stmt   { printf ("\n"); }
     ;

stmt : expr ';'  %dprec 1
     | decl      %dprec 2
     ;

expr : ID               { printf ("%s ", $$); }
     | TYPENAME '(' expr ')'
                        { printf ("%s  ", $1); }
     | expr '+' expr    { printf ("+ "); }
     | expr '=' expr    { printf ("= "); }
     ;

decl : TYPENAME declarator ';'
                        { printf ("%s  ", $1); }
     | TYPENAME declarator '=' expr ';'
                        { printf ("%s  ", $1); }
     ;

declarator : ID         { printf ("\"%s\" ", $1); }
     | '(' declarator ')'
     ;

This models a problematic part of the C++ grammar--the ambiguity between certain declarations and statements. For example,

 
T (x) = y+z;

parses as either an expr or a stmt (assuming that `T' is recognized as a TYPENAME and `x' as an ID). Bison detects this as a reduce/reduce conflict between the rules expr : ID and declarator : ID, which it cannot resolve at the time it encounters x in the example above. The two %dprec declarations, however, give precedence to interpreting the example as a decl, which implies that x is a declarator. The parser therefore prints

 
"x" y z + T <init-declare>

Consider a different input string for this parser:

 
T (x) + y;

Here, there is no ambiguity (this cannot be parsed as a declaration). However, at the time the Bison parser encounters x, it does not have enough information to resolve the reduce/reduce conflict (again, between x as an expr or a declarator). In this case, no precedence declaration is used. Instead, the parser splits into two, one assuming that x is an expr, and the other assuming x is a declarator. The second of these parsers then vanishes when it sees +, and the parser prints

 
x T <cast> y +

Suppose that instead of resolving the ambiguity, you wanted to see all the possibilities. For this purpose, we must merge the semantic actions of the two possible parsers, rather than choosing one over the other. To do so, you could change the declaration of stmt as follows:

 
stmt : expr ';'  %merge <stmtMerge>
     | decl      %merge <stmtMerge>
     ;

and define the stmtMerge function as:

 
static YYSTYPE
stmtMerge (YYSTYPE x0, YYSTYPE x1)
{
  printf ("<OR> ");
  return "";
}

with an accompanying forward declaration in the C declarations at the beginning of the file:

 
%{
  #define YYSTYPE char const *
  static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1);
%}

With these declarations, the resulting parser will parse the first example as both an expr and a decl, and print

 
"x" y z + T <init-declare> x T <cast> y z + = <OR>

The GLR parsers require a compiler for ISO C89 or later. In addition, they use the inline keyword, which is not C89, but is C99 and is a common extension in pre-C99 compilers. It is up to the user of these parsers to handle portability issues. For instance, if using Autoconf and the Autoconf macro AC_C_INLINE, a mere

 
%{
  #include <config.h>
%}

will suffice. Otherwise, we suggest

 
%{
  #if __STDC_VERSION__ < 199901 && ! defined __GNUC__ && ! defined inline
   #define inline
  #endif
%}


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003