www.delorie.com/gnu/docs/elisp-manual-21/elisp_569.html   search  
Buy the book!

GNU Emacs Lisp Reference Manual

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ] Backslash Constructs in Regular Expressions

For the most part, `\' followed by any character matches only that character. However, there are several exceptions: certain two-character sequences starting with `\' that have special meanings. (The character after the `\' in such a sequence is always ordinary when used on its own.) Here is a table of the special `\' constructs.

specifies an alternative. Two regular expressions a and b with `\|' in between form an expression that matches anything that either a or b matches.

Thus, `foo\|bar' matches either `foo' or `bar' but no other string.

`\|' applies to the largest possible surrounding expressions. Only a surrounding `\( ... \)' grouping can limit the grouping power of `\|'.

Full backtracking capability exists to handle multiple uses of `\|', if you use the POSIX regular expression functions (see section 34.4 POSIX Regular Expression Searching).

is a postfix operator that repeats the previous pattern exactly m times. Thus, `x\{5\}' matches the string `xxxxx' and nothing else. `c[ad]\{3\}r' matches string such as `caaar', `cdddr', `cadar', and so on.

is more general postfix operator that specifies repetition with a minimum of m repeats and a maximum of n repeats. If m is omitted, the minimum is 0; if n is omitted, there is no maximum.

For example, `c[ad]\{1,2\}r' matches the strings `car', `cdr', `caar', `cadr', `cdar', and `cddr', and nothing else.
`\{0,1\}' or `\{,1\}' is equivalent to `?'.
`\{0,\}' or `\{,\}' is equivalent to `*'.
`\{1,\}' is equivalent to `+'.

`\( ... \)'
is a grouping construct that serves three purposes:

  1. To enclose a set of `\|' alternatives for other operations. Thus, the regular expression `\(foo\|bar\)x' matches either `foox' or `barx'.

  2. To enclose a complicated expression for the postfix operators `*', `+' and `?' to operate on. Thus, `ba\(na\)*' matches `ba', `bana', `banana', `bananana', etc., with any number (zero or more) of `na' strings.

  3. To record a matched substring for future reference with `\digit' (see below).

This last application is not a consequence of the idea of a parenthetical grouping; it is a separate feature that was assigned as a second meaning to the same `\( ... \)' construct because, in pratice, there was usually no conflict between the two meanings. But occasionally there is a conflict, and that led to the introduction of shy groups.

`\(?: ... \)'
is the shy group construct. A shy group serves the first two purposes of an ordinary group (controlling the nesting of other operators), but it does not get a number, so you cannot refer back to its value with `\digit'.

Shy groups are particulary useful for mechanically-constructed regular expressions because they can be added automatically without altering the numbering of any ordinary, non-shy groups.

matches the same text that matched the digitth occurrence of a grouping (`\( ... \)') construct.

In other words, after the end of a group, the matcher remembers the beginning and end of the text matched by that group. Later on in the regular expression you can use `\' followed by digit to match that same text, whatever it may have been.

The strings matching the first nine grouping constructs appearing in the entire regular expression passed to a search or matching function are assigned numbers 1 through 9 in the order that the open parentheses appear in the regular expression. So you can use `\1' through `\9' to refer to the text matched by the corresponding grouping constructs.

For example, `\(.*\)\1' matches any newline-free string that is composed of two identical halves. The `\(.*\)' matches the first half, which may be anything, but the `\1' that follows must match the same exact text.

If a particular grouping construct in the regular expression was never matched--for instance, if it appears inside of an alternative that wasn't used, or inside of a repetition that repeated zero times--then the corresponding `\digit' construct never matches anything. To use an artificial example,, `\(foo\(b*\)\|lose\)\2' cannot match `lose': the second alternative inside the larger group matches it, but then `\2' is undefined and can't match anything. But it can match `foobb', because the first alternative matches `foob' and `\2' matches `b'.

matches any word-constituent character. The editor syntax table determines which characters these are. See section 35. Syntax Tables.

matches any character that is not a word constituent.

matches any character whose syntax is code. Here code is a character that represents a syntax code: thus, `w' for word constituent, `-' for whitespace, `(' for open parenthesis, etc. To represent whitespace syntax, use either `-' or a space character. See section 35.2.1 Table of Syntax Classes, for a list of syntax codes and the characters that stand for them.

matches any character whose syntax is not code.

matches any character whose category is c. Here c is a character that represents a category: thus, `c' for Chinese characters or `g' for Greek characters in the standard category table.

matches any character whose category is not c.

The following regular expression constructs match the empty string--that is, they don't use up any characters--but whether they match depends on the context.

matches the empty string, but only at the beginning of the buffer or string being matched against.

matches the empty string, but only at the end of the buffer or string being matched against.

matches the empty string, but only at point. (This construct is not defined when matching against a string.)

matches the empty string, but only at the beginning or end of a word. Thus, `\bfoo\b' matches any occurrence of `foo' as a separate word. `\bballs?\b' matches `ball' or `balls' as a separate word.

`\b' matches at the beginning or end of the buffer regardless of what text appears next to it.

matches the empty string, but not at the beginning or end of a word.

matches the empty string, but only at the beginning of a word. `\<' matches at the beginning of the buffer only if a word-constituent character follows.

matches the empty string, but only at the end of a word. `\>' matches at the end of the buffer only if the contents end with a word-constituent character.

Not every string is a valid regular expression. For example, a string with unbalanced square brackets is invalid (with a few exceptions, such as `[]]'), and so is a string that ends with a single `\'. If an invalid regular expression is passed to any of the search functions, an invalid-regexp error is signaled.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003