www.delorie.com/gnu/docs/elisp-manual-21/elisp_31.html   search  
 
Buy the book!


GNU Emacs Lisp Reference Manual

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.3.8.2 Non-ASCII Characters in Strings

You can include a non-ASCII international character in a string constant by writing it literally. There are two text representations for non-ASCII characters in Emacs strings (and in buffers): unibyte and multibyte. If the string constant is read from a multibyte source, such as a multibyte buffer or string, or a file that would be visited as multibyte, then the character is read as a multibyte character, and that makes the string multibyte. If the string constant is read from a unibyte source, then the character is read as unibyte and that makes the string unibyte.

You can also represent a multibyte non-ASCII character with its character code: use a hex escape, `\xnnnnnnn', with as many digits as necessary. (Multibyte non-ASCII character codes are all greater than 256.) Any character which is not a valid hex digit terminates this construct. If the next character in the string could be interpreted as a hex digit, write `\ ' (backslash and space) to terminate the hex escape--for example, `\x8e0\ ' represents one character, `a' with grave accent. `\ ' in a string constant is just like backslash-newline; it does not contribute any character to the string, but it does terminate the preceding hex escape.

Using a multibyte hex escape forces the string to multibyte. You can represent a unibyte non-ASCII character with its character code, which must be in the range from 128 (0200 octal) to 255 (0377 octal). This forces a unibyte string. See section 33.1 Text Representations, for more information about the two text representations.


  webmaster   donations   bookstore     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003