www.delorie.com/gnu/docs/elisp-manual-21/elisp_22.html   search  
Buy the book!

GNU Emacs Lisp Reference Manual

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.3.3 Character Type

A character in Emacs Lisp is nothing more than an integer. In other words, characters are represented by their character codes. For example, the character A is represented as the integer 65.

Individual characters are not often used in programs. It is far more common to work with strings, which are sequences composed of characters. See section 2.3.8 String Type.

Characters in strings, buffers, and files are currently limited to the range of 0 to 524287--nineteen bits. But not all values in that range are valid character codes. Codes 0 through 127 are ASCII codes; the rest are non-ASCII (see section 33. Non-ASCII Characters). Characters that represent keyboard input have a much wider range, to encode modifier keys such as Control, Meta and Shift.

Since characters are really integers, the printed representation of a character is a decimal number. This is also a possible read syntax for a character, but writing characters that way in Lisp programs is a very bad idea. You should always use the special read syntax formats that Emacs Lisp provides for characters. These syntax formats start with a question mark.

The usual read syntax for alphanumeric characters is a question mark followed by the character; thus, `?A' for the character A, `?B' for the character B, and `?a' for the character a.

For example:

?Q => 81     ?q => 113

You can use the same syntax for punctuation characters, but it is often a good idea to add a `\' so that the Emacs commands for editing Lisp code don't get confused. For example, `?\ ' is the way to write the space character. If the character is `\', you must use a second `\' to quote it: `?\\'.

You can express the characters Control-g, backspace, tab, newline, vertical tab, formfeed, return, del, and escape as `?\a', `?\b', `?\t', `?\n', `?\v', `?\f', `?\r', `?\d', and `?\e', respectively. Thus,

?\a => 7                 ; C-g
?\b => 8                 ; backspace, BS, C-h
?\t => 9                 ; tab, TAB, C-i
?\n => 10                ; newline, C-j
?\v => 11                ; vertical tab, C-k
?\f => 12                ; formfeed character, C-l
?\r => 13                ; carriage return, RET, C-m
?\e => 27                ; escape character, ESC, C-[
?\\ => 92                ; backslash character, \
?\d => 127               ; delete character, DEL

These sequences which start with backslash are also known as escape sequences, because backslash plays the role of an escape character; this usage has nothing to do with the character ESC.

Control characters may be represented using yet another read syntax. This consists of a question mark followed by a backslash, caret, and the corresponding non-control character, in either upper or lower case. For example, both `?\^I' and `?\^i' are valid read syntax for the character C-i, the character whose value is 9.

Instead of the `^', you can use `C-'; thus, `?\C-i' is equivalent to `?\^I' and to `?\^i':

?\^I => 9     ?\C-I => 9

In strings and buffers, the only control characters allowed are those that exist in ASCII; but for keyboard input purposes, you can turn any character into a control character with `C-'. The character codes for these non-ASCII control characters include the 2**26 bit as well as the code for the corresponding non-control character. Ordinary terminals have no way of generating non-ASCII control characters, but you can generate them straightforwardly using X and other window systems.

For historical reasons, Emacs treats the DEL character as the control equivalent of ?:

?\^? => 127     ?\C-? => 127

As a result, it is currently not possible to represent the character Control-?, which is a meaningful input character under X, using `\C-'. It is not easy to change this, as various Lisp files refer to DEL in this way.

For representing control characters to be found in files or strings, we recommend the `^' syntax; for control characters in keyboard input, we prefer the `C-' syntax. Which one you use does not affect the meaning of the program, but may guide the understanding of people who read it.

A meta character is a character typed with the META modifier key. The integer that represents such a character has the 2**27 bit set (which on most machines makes it a negative number). We use high bits for this and other modifiers to make possible a wide range of basic character codes.

In a string, the 2**7 bit attached to an ASCII character indicates a meta character; thus, the meta characters that can fit in a string have codes in the range from 128 to 255, and are the meta versions of the ordinary ASCII characters. (In Emacs versions 18 and older, this convention was used for characters outside of strings as well.)

The read syntax for meta characters uses `\M-'. For example, `?\M-A' stands for M-A. You can use `\M-' together with octal character codes (see below), with `\C-', or with any other syntax for a character. Thus, you can write M-A as `?\M-A', or as `?\M-\101'. Likewise, you can write C-M-b as `?\M-\C-b', `?\C-\M-b', or `?\M-\002'.

The case of a graphic character is indicated by its character code; for example, ASCII distinguishes between the characters `a' and `A'. But ASCII has no way to represent whether a control character is upper case or lower case. Emacs uses the 2**25 bit to indicate that the shift key was used in typing a control character. This distinction is possible only when you use X terminals or other special terminals; ordinary terminals do not report the distinction to the computer in any way. The Lisp syntax for the shift bit is `\S-'; thus, `?\C-\S-o' or `?\C-\S-O' represents the shifted-control-o character.

The X Window System defines three other modifier bits that can be set in a character: hyper, super and alt. The syntaxes for these bits are `\H-', `\s-' and `\A-'. (Case is significant in these prefixes.) Thus, `?\H-\M-\A-x' represents Alt-Hyper-Meta-x. Numerically, the bit values are 2**22 for alt, 2**23 for super and 2**24 for hyper.

Finally, the most general read syntax for a character represents the character code in either octal or hex. To use octal, write a question mark followed by a backslash and the octal character code (up to three octal digits); thus, `?\101' for the character A, `?\001' for the character C-a, and ?\002 for the character C-b. Although this syntax can represent any ASCII character, it is preferred only when the precise octal value is more important than the ASCII representation.

?\012 => 10         ?\n => 10         ?\C-j => 10
?\101 => 65         ?A => 65

To use hex, write a question mark followed by a backslash, `x', and the hexadecimal character code. You can use any number of hex digits, so you can represent any character code in this way. Thus, `?\x41' for the character A, `?\x1' for the character C-a, and ?\x8e0 for the Latin-1 character `a' with grave accent.

A backslash is allowed, and harmless, preceding any character without a special escape meaning; thus, `?\+' is equivalent to `?+'. There is no reason to add a backslash before most characters. However, you should add a backslash before any of the characters `()\|;'`"#.,' to avoid confusing the Emacs commands for editing Lisp code. Also add a backslash before whitespace characters such as space, tab, newline and formfeed. However, it is cleaner to use one of the easily readable escape sequences, such as `\t', instead of an actual whitespace character such as a tab.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003