www.delorie.com/gnu/docs/sed/sed_5.html   search  
 
Buy the book!


sed, a stream editor

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.2 Overview of Regular Expression Syntax

To know how to use sed, people should understand regular expressions (regexp for short). A regular expression is a pattern that is matched against a subject string from left to right. Most characters stand for themselves in a pattern, and match the corresponding characters in the subject. As a trivial example, the pattern

 
     The quick brown fox

matches a portion of a subject string that is identical to itself. The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of metacharacters, which do not stand for themselves but instead are interpreted in some special way. Here is a brief description of regular expression syntax as used in sed.

char
A single character, if not special, is matched against text.

*
Matches a sequence of zero or more repetitions of previous character, grouped regexp (see below), or class.

\+
As *, but matches one or more. It is a GNU extension.

\?
As *, but only matches zero or one. It is a GNU extension.

\{i\}
As *, but matches exactly i sequences (i is a number; for portability, keep it between 0 and 255)

\{i,j\}
Matches between i and j, inclusive, sequences.

\{i,\}
Matches more than or equal to i sequences.

\(regexp\)
Groups the inner regexp as a whole, this is used to:

.
Matches any character

^
Matches the null string at beginning of line, i.e. what appears after the caret must appear at the beginning of line. ^#include will match only lines where `#include' is the first thing on line--if there are spaces before, for example, the match fails.

$
It is the same as ^, but refers to end of line

[list]
[^list]
Matches any single character in list: for example, `[aeiou]' matches all vowels. A list may include sequences like `char1-char2', which matches any character between (inclusive) char1 and char2.

The caret reverses the meaning of the regexp, so that it matches any single character NOT in list. To include `]' in the list, make it the first character (after the caret if needed), to include `-' in the list, make it the first or last; to include `^' put it after the first character.

regexp1\|regexp2
Matches either regexp1 or regexp2. Use parentheses to use complex alternative regular expressions. The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used. It is a GNU extension.

\digit
Matches the digit-th \(...\) reference in the regular expression.

\char
Matches character char; this is to be used to match special characters, referred above. Note that the only C-like backslash sequence that you can portably assume to be interpreted is \n for a new-line; in particular \t matches a `t' under most implementations of sed, rather than a tabulation character.

Note that the regular expression matcher is greedy, i.e., if two or more matches are detected, it selects the longest; if there are two or more selected with the same size, it selects the first in text.

Examples:

`abcdef'
Matches `abcdef'.

`a*b'
Matches zero or more `a's followed by a single `b'. For example, `b' or `aaaaab'.

`a\?b'
Matches `b' or `ab'.

`a\+b\+'
Matches one or more `a's followed by one or more `b's: `ab' is the shorter possible match, but other examples are `aaaab' or `abbbbb' or `aaaaaabbbbbbb'.

`.*'
`.\+'
These two both match all the characters on a line; however, the first matches every line (including empty ones), while the second only matches lines containing at least one character.

`^main.*(.*)'
This searches for a line containing `main' as the first thing on the line, followed by an opening and closing parenthesis. The `n', `(' and `)' need not be adjacent.

`^#'
This matches lines beginning with a hash (or sharp) character.

`\\$'
This matches lines ending with a single backslash. The regexp contains two backslashes for escaping.

`\$'
Instead, this matches lines containing a single dollar, because it is escaped.

`[a-zA-Z0-9]'
This matches any letters or digits.

`[^ tab]\+'
This matches one or more sequences of any character that isn't a space or tab. Usually this means a word.

`^\(.*\)\n\1$'
This matches two equal lines without a trailing new-line.

`.\{9\}A$'
This matches an A that is the last character on line, with at least nine preceding characters.

`^.\{15\}A'
This matches an A that is the 16th character on a line.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003