www.delorie.com/gnu/docs/emacs-lisp-intro/emacs-lisp-intro_186.html   search  
 
Buy the book!


Programming in Emacs Lisp

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.1 The Regular Expression for sentence-end

The symbol sentence-end is bound to the pattern that marks the end of a sentence. What should this regular expression be?

Clearly, a sentence may be ended by a period, a question mark, or an exclamation mark. Indeed, only clauses that end with one of those three characters should be considered the end of a sentence. This means that the pattern should include the character set:

 
[.?!]

However, we do not want forward-sentence merely to jump to a period, a question mark, or an exclamation mark, because such a character might be used in the middle of a sentence. A period, for example, is used after abbreviations. So other information is needed.

According to convention, you type two spaces after every sentence, but only one space after a period, a question mark, or an exclamation mark in the body of a sentence. So a period, a question mark, or an exclamation mark followed by two spaces is a good indicator of an end of sentence. However, in a file, the two spaces may instead be a tab or the end of a line. This means that the regular expression should include these three items as alternatives.

This group of alternatives will look like this:

 
\\($\\| \\|  \\)
       ^   ^^
      TAB  SPC

Here, `$' indicates the end of the line, and I have pointed out where the tab and two spaces are inserted in the expression. Both are inserted by putting the actual characters into the expression.

Two backslashes, `\\', are required before the parentheses and vertical bars: the first backslash quotes the following backslash in Emacs; and the second indicates that the following character, the parenthesis or the vertical bar, is special.

Also, a sentence may be followed by one or more carriage returns, like this:

 
[
]*

Like tabs and spaces, a carriage return is inserted into a regular expression by inserting it literally. The asterisk indicates that the RET is repeated zero or more times.

But a sentence end does not consist only of a period, a question mark or an exclamation mark followed by appropriate space: a closing quotation mark or a closing brace of some kind may precede the space. Indeed more than one such mark or brace may precede the space. These require a expression that looks like this:

 
[]\"')}]*

In this expression, the first `]' is the first character in the expression; the second character is `"', which is preceded by a `\' to tell Emacs the `"' is not special. The last three characters are `'', `)', and `}'.

All this suggests what the regular expression pattern for matching the end of a sentence should be; and, indeed, if we evaluate sentence-end we find that it returns the following value:

 
sentence-end
     => "[.?!][]\"')}]*\\($\\|     \\|  \\)[
]*"


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003