GNU ed reference manual
4. Regular expressions
Regular expressions are patterns used in selecting text. For example,
the ed command
prints all lines containing string. Regular expressions are also
used by the `s' command for selecting old text to be replaced with
new.
In addition to a specifying string literals, regular expressions can
represent classes of strings. Strings thus represented are said to be
matched by the corresponding regular expression. If it is possible for
a regular expression to match several strings in a line, then the
left-most longest match is the one selected.
The following symbols are used in constructing regular expressions:
c
- Any character c not listed below, including `{', `}',
`(', `)', `<' and `>', matches itself.
\c
- Any backslash-escaped character c, other than `{',
``}', `(', `)', `<', `>', `b', `B',
`w', `W', `+' and `?', matches itself.
.
- Matches any single character.
[char-class]
- Matches any single character in char-class. To include a `]'
in char-class, it must be the first character. A range of
characters may be specified by separating the end characters of the
range with a `-', e.g., `a-z' specifies the lower case
characters. The following literal expressions can also be used in
char-class to specify sets of characters:
| | [:alnum:] [:cntrl:] [:lower:] [:space:]
[:alpha:] [:digit:] [:print:] [:upper:]
[:blank:] [:graph:] [:punct:] [:xdigit:]
|
If `-' appears as the first or last character of char-class,
then it matches itself. All other characters in char-class match
themselves.
Patterns in
char-class
of the form:
where col-elm is a collating element are interpreted
according to locale (5) (not currently supported). See
regex (3) for an explanation of these constructs.
[^char-class]
- Matches any single character, other than newline, not in
char-class. char-class is defined as above.
^
- If `^' is the first character of a regular expression, then it
anchors the regular expression to the beginning of a line. Otherwise,
it matches itself.
$
- If `$' is the last character of a regular expression, it anchors
the regular expression to the end of a line. Otherwise, it matches
itself.
\(re\)
- Defines a (possibly null) subexpression re.
Subexpressions may be nested. A
subsequent backreference of the form `\n', where n is a
number in the range [1,9], expands to the text matched by the nth
subexpression. For example, the regular expression `\(a.c\)\1' matches
the string `abcabc', but not `abcadc'.
Subexpressions are ordered relative to their left delimiter.
*
- Matches the single character regular expression or subexpression
immediately preceding it zero or more times. If `*' is the first
character of a regular expression or subexpression, then it matches
itself. The `*' operator sometimes yields unexpected results. For
example, the regular expression `b*' matches the beginning of the
string `abbb', as opposed to the substring `bbb', since a
null match is the only left-most match.
\{n,m\}
\{n,\}
\{n\}
- Matches the single character regular expression or subexpression
immediately preceding it at least n and at most m times. If
m is omitted, then it matches at least n times. If the
comma is also omitted, then it matches exactly n times.
If any of these forms occurs first in a regular expression or subexpression,
then it is interpreted literally (i.e., the regular expression `\{2\}'
matches the string `{2}', and so on).
\<
\>
- Anchors the single character regular expression or subexpression
immediately following it to the beginning (in the case of `\<')
or ending (in the case of `\>') of
a word, i.e., in ASCII, a maximal string of alphanumeric characters,
including the underscore (_).
The following extended operators are preceded by a backslash `\' to
distinguish them from traditional ed syntax.
\`
\'
- Unconditionally matches the beginning `\`' or ending `\'' of a line.
\?
- Optionally matches the single character regular expression or subexpression
immediately preceding it. For example, the regular expression `a[bd]\?c'
matches the strings `abc', `adc' and `ac'.
If `\?' occurs at the beginning
of a regular expressions or subexpression, then it matches a literal `?'.
\+
- Matches the single character regular expression or subexpression
immediately preceding it one or more times. So the regular expression
`a+' is shorthand for `aa*'. If `\+' occurs at the
beginning of a regular expression or subexpression, then it matches a
literal `+'.
\b
- Matches the beginning or ending (null string) of a word. Thus the regular
expression `\bhello\b' is equivalent to `\<hello\>'.
However, `\b\b'
is a valid regular expression whereas `\<\>' is not.
\B
- Matches (a null string) inside a word.
\w
- Matches any character in a word.
\W
- Matches any character not in a word.