| www.delorie.com/gnu/docs/flex/flex_7.html | search |
![]() Buy the book! | |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The patterns in the input are written using an extended set of regular expressions. These are:
2a
flex
cannot match correctly; see notes in the Deficiencies / Bugs section
below regarding "dangerous trailing context".)
Note that flex's notion of "newline" is exactly whatever the C compiler used to compile flex interprets '\n' as; in particular, on some DOS systems you must either filter out \r's in the input yourself, or explicitly use r/\r\n for "r$".
Note that inside of a character class, all regular expression operators lose their special meaning except escape ('\') and the character class operators, '-', ']', and, at the beginning of the class, '^'.
The regular expressions listed above are grouped according to precedence, from highest precedence at the top to lowest at the bottom. Those grouped together have equal precedence. For example,
foo|bar* |
is the same as
(foo)|(ba(r*)) |
since the '*' operator has higher precedence than concatenation, and concatenation higher than alternation ('|'). This pattern therefore matches either the string "foo" or the string "ba" followed by zero-or-more r's. To match "foo" or zero-or-more "bar"'s, use:
foo|(bar)* |
and to match zero-or-more "foo"'s-or-"bar"'s:
(foo|bar)* |
In addition to characters and ranges of characters, character classes can also contain character class expressions. These are expressions enclosed inside `[': and `:'] delimiters (which themselves must appear between the '[' and ']' of the character class; other elements may occur inside the character class, too). The valid expressions are:
[:alnum:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:] |
These expressions all designate a set of characters equivalent to the corresponding standard C `isXXX' function. For example, `[:alnum:]' designates those characters for which `isalnum()' returns true - i.e., any alphabetic or numeric. Some systems don't provide `isblank()', so flex defines `[:blank:]' as a blank or a tab.
For example, the following character classes are all equivalent:
[[:alnum:]] [[:alpha:][:digit:] [[:alpha:]0-9] [a-zA-Z0-9] |
If your scanner is case-insensitive (the `-i' flag), then `[:upper:]' and `[:lower:]' are equivalent to `[:alpha:]'.
Some notes on patterns:
The following are illegal:
foo/bar$ <sc1>foo<sc2>bar |
Note that the first of these, can be written "foo/bar\n".
The following will result in '$' or '^' being treated as a normal character:
foo|(bar$) foo|^bar |
If what's wanted is a "foo" or a bar-followed-by-a-newline, the following could be used (the special '|' action is explained below):
foo | bar$ /* action goes here */ |
A similar trick will work for matching a foo or a bar-at-the-beginning-of-a-line.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| webmaster donations bookstore | delorie software privacy |
| Copyright © 2003 by The Free Software Foundation | Updated Jun 2003 |