| www.delorie.com/gnu/docs/emacs-lisp-intro/emacs-lisp-intro_186.html | search |
![]() Buy the book! | |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
sentence-end
The symbol sentence-end is bound to the pattern that marks the
end of a sentence. What should this regular expression be?
Clearly, a sentence may be ended by a period, a question mark, or an exclamation mark. Indeed, only clauses that end with one of those three characters should be considered the end of a sentence. This means that the pattern should include the character set:
[.?!] |
However, we do not want forward-sentence merely to jump to a
period, a question mark, or an exclamation mark, because such a character
might be used in the middle of a sentence. A period, for example, is
used after abbreviations. So other information is needed.
According to convention, you type two spaces after every sentence, but only one space after a period, a question mark, or an exclamation mark in the body of a sentence. So a period, a question mark, or an exclamation mark followed by two spaces is a good indicator of an end of sentence. However, in a file, the two spaces may instead be a tab or the end of a line. This means that the regular expression should include these three items as alternatives.
This group of alternatives will look like this:
\\($\\| \\| \\)
^ ^^
TAB SPC
|
Here, `$' indicates the end of the line, and I have pointed out where the tab and two spaces are inserted in the expression. Both are inserted by putting the actual characters into the expression.
Two backslashes, `\\', are required before the parentheses and vertical bars: the first backslash quotes the following backslash in Emacs; and the second indicates that the following character, the parenthesis or the vertical bar, is special.
Also, a sentence may be followed by one or more carriage returns, like this:
[ ]* |
Like tabs and spaces, a carriage return is inserted into a regular expression by inserting it literally. The asterisk indicates that the RET is repeated zero or more times.
But a sentence end does not consist only of a period, a question mark or an exclamation mark followed by appropriate space: a closing quotation mark or a closing brace of some kind may precede the space. Indeed more than one such mark or brace may precede the space. These require a expression that looks like this:
[]\"')}]* |
In this expression, the first `]' is the first character in the expression; the second character is `"', which is preceded by a `\' to tell Emacs the `"' is not special. The last three characters are `'', `)', and `}'.
All this suggests what the regular expression pattern for matching the
end of a sentence should be; and, indeed, if we evaluate
sentence-end we find that it returns the following value:
sentence-end
=> "[.?!][]\"')}]*\\($\\| \\| \\)[
]*"
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| webmaster donations bookstore | delorie software privacy |
| Copyright © 2003 by The Free Software Foundation | Updated Jun 2003 |