www.delorie.com/gnu/docs/recode/recode_29.html   search  
 
Buy GNU books!


The recode reference manual

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.7 Fully interpreted UCS dump

Another device may be used to get fully interpreted dumps of an UCS-2 stream of characters, with one UCS-2 character displayed on a full output line. Each line receives the RFC 1345 mnemonic for the character if it exists, the UCS-2 value of the character, and a descriptive comment for that character. As each input character produces its own output line, beware that the output file from this conversion may be much, much bigger than the input file.

This charset is available in recode under the name dump-with-names.

This dump-with-names feature has been implemented as a charset rather than a surface. This is surely debatable. The current implementation allows for dumping charsets other than UCS-2. For example, the command `recode l2..full < input' implies a necessary conversion from Latin-2 to UCS-2, as dump-with-names is only connected out from UCS-2. In such cases, recode does not display the original Latin-2 codes in the dump, only the corresponding UCS-2 values. To give a simpler example, the command

 
echo 'Hello, world!' | recode us..dump

produces the following output:

 
UCS2   Mne   Description

0048   H     latin capital letter h
0065   e     latin small letter e
006C   l     latin small letter l
006C   l     latin small letter l
006F   o     latin small letter o
002C   ,     comma
0020   SP    space
0077   w     latin small letter w
006F   o     latin small letter o
0072   r     latin small letter r
006C   l     latin small letter l
0064   d     latin small letter d
0021   !     exclamation mark
000A   LF    line feed (lf)

The descriptive comment is given in English and ASCII, yet if the English description is not available but a French one is, then the French description is given instead, using Latin-1. However, if the LANGUAGE or LANG environment variable begins with the letters `fr', then listing preference goes to French when both descriptions are available.

Here is another example. To get the long description of the code 237 in Latin-5 table, one may use the following command.

 
echo -n 237 | recode l5/d..dump

If your echo does not grok `-n', use `echo 237\c' instead. Here is how to see what Unicode U+03C6 means, while getting rid of the title lines.

 
echo -n 0x03C6 | recode u2/x2..dump | tail +3


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright © 2003   by The Free Software Foundation     Updated Jun 2003  

Please take a moment to fill out this visitor survey
You can help support this site by visiting the advertisers that sponsor it! (only once each, though)