| www.delorie.com/gnu/docs/gawk/gawk_28.html | search |
![]() Buy the book! | |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
You can combine regular expressions with special characters, called regular expression operators or metacharacters, to increase the power and versatility of regular expressions.
The escape sequences described earlier in 3.2 Escape Sequences, are valid inside a regexp. They are introduced by a `\' and are recognized and converted into corresponding real characters as the very first step in processing regexps.
Here is a list of metacharacters. All characters that are not escape sequences and that are not listed in the table stand for themselves:
\
^
It is important to realize that `^' does not match the beginning of a line embedded in a string. The condition is not true in the following example:
if ("line1\nLINE 2" ~ /^L/) ...
|
$
if ("line1\nLINE 2" ~ /1$/) ...
|
.
In strict POSIX mode (see section Command-Line Options),
`.' does not match the NUL
character, which is a character with all bits equal to zero.
Otherwise, NUL is just another character. Other versions of awk
may not be able to match the NUL character.
[...]
[^ ...]
|
The alternation applies to the largest possible regexps on either side.
(...)
*
The `*' repeats the smallest possible preceding expression. (Use parentheses if you want to repeat a larger expression.) It finds as many repetitions as possible. For example, `awk '/\(c[ad][ad]*r x\)/ { print }' sample' prints every record in `sample' containing a string of the form `(car x)', `(cdr x)', `(cadr x)', and so on. Notice the escaping of the parentheses by preceding them with backslashes.
+
awk '/\(c[ad]+r x\)/ { print }' sample
|
?
{n}
{n,}
{n,m}
wh{3}y
wh{3,5}y
wh{2,}y
Interval expressions were not traditionally available in awk.
They were added as part of the POSIX standard to make awk
and egrep consistent with each other.
However, because old programs may use `{' and `}' in regexp
constants, by default gawk does not match interval expressions
in regexps. If either `--posix' or `--re-interval' are specified
(see section Command-Line Options), then interval expressions
are allowed in regexps.
For new programs that use `{' and `}' in regexp constants,
it is good practice to always escape them with a backslash. Then the
regexp constants are valid and work the way you want them to, using
any version of awk.(13)
In regular expressions, the `*', `+', and `?' operators, as well as the braces `{' and `}', have the highest precedence, followed by concatenation, and finally by `|'. As in arithmetic, parentheses can change how operators are grouped.
In POSIX awk and gawk, the `*', `+', and `?' operators
stand for themselves when there is nothing in the regexp that precedes them.
For example, `/+/' matches a literal plus sign. However, many other versions of
awk treat such a usage as a syntax error.
If gawk is in compatibility mode
(see section Command-Line Options),
POSIX character classes and interval expressions are not available in
regular expressions.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| webmaster donations bookstore | delorie software privacy |
| Copyright © 2003 by The Free Software Foundation | Updated Jun 2003 |