www.delorie.com/gnu/docs/recode/recode_27.html   search  
 
Buy GNU books!


The recode reference manual

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.5 Universal Transformation Format, 16 bits

Another external surface of UCS is also variable length, each character using either two or four bytes. It is usable for the subset defined by the first million characters (17 * 2^16) of UCS.

Martin J. Dürst writes (to comp.std.internat, on 1995-03-28):

UTF-16 is another method that reserves two times 1024 codepoints in Unicode and uses them to index around one million additional characters. UTF-16 is a little bit like former multibyte codes, but quite not so, as both the first and the second 16-bit code clearly show what they are. The idea is that one million codepoints should be enough for all the rare Chinese ideograms and historical scripts that do not fit into the Base Multilingual Plane of ISO 10646 (with just about 63,000 positions available, now that 2,000 are gone).

This charset is available in recode under the name UTF-16. Accepted aliases are Unicode, TF-16 and u6.


  webmaster   donations   bookstore     delorie software   privacy  
  Copyright © 2003   by The Free Software Foundation     Updated Jun 2003