www.delorie.com/gnu/docs/recode/recode_2.html   search  
 
Buy GNU books!


The recode reference manual

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2. Terminology and purpose

A few terms are used over and over in this manual, our wise reader will learn their meaning right away. Both ISO (International Organization for Standardisation) and IETF (Internet Engineering Task Force) have their own terminology, this document does not try to stick to either one in a strict way, while it does not want to throw more confusion in the field. On the other hand, it would not be efficient using paraphrases all the time, so recode coins a few short words, which are explained below.

A charset, in the context of recode, is a particular association between computer codes on one side, and a repertoire of intended characters on the other side. Codes are usually taken from a set of consecutive small integers, starting at 0. Some characters have a graphical appearance (glyph) or displayable effect, others have special uses like, for example, to control devices or to interact with neighbouring codes to specify them more precisely. So, a charset is roughly one of those tables, giving a meaning to each of the codes from the set of allowable values. MIME also uses the term charset with approximately the same meaning. It does not exactly corresponds to what ISO calls a coded character set, that is, a set of characters with an encoding for them. An coded character set does not necessarily use all available code positions, while a MIME charset usually tries to specify them all. A MIME charset might be the union of a few disjoint coded character sets.

A surface is a term used in recode only, and is a short for surface transformation of a charset stream. This is any kind of mapping, usually reversible, which associates physical bits in some medium for a stream of characters taken from one or more charsets (usually one). A surface is a kind of varnish added over a charset so it fits in actual bits and bytes. How end of lines are exactly encoded is not really pertinent to the charset, and so, there is surface for end of lines. Base64 is also a surface, as we may encode any charset in it. Other examples would DES enciphering, or gzip compression (even if recode does not offer them currently): these are ways to give a real life to theoretical charsets. The trivial surface consists into putting characters into fixed width little chunks of bits, usually eight such bits per character. But things are not always that simple.

This recode library, and the program by that name, have the purpose of converting files between various charsets and surfaces. When this cannot be done in exact ways, as it is often the case, the program may get rid of the offending characters or fall back on approximations. This library recognises or produces around 175 such charsets under 500 names, and handle a dozen surfaces. Since it can convert each charset to almost any other one, many thousands of different conversions are possible.

The recode program and library do not usually know how to split and sort out textual and non-textual information which may be mixed in a single input file. For example, there is no surface which currently addresses the problem of how lines are blocked into physical records, when the blocking information is added as binary markers or counters within files. So, recode should be given textual streams which are rather pure.

This tool pays special attention to superimposition of diacritics for some French representations. This orientation is mostly historical, it does not impair the usefulness, generality or extensibility of the program. `recode' is both a French and English word. For those who pay attention to those things, the proper pronunciation is French (that is, `racud', with `a' like in `above', and `u' like in `cut').

The program recode has been written by François Pinard. With time, it got to reuse works from other contributors, and notably, those of Keld Simonsen and Bruno Haible.

2.1 Overview of charsets  
2.2 Overview of surfaces  
2.3 Contributions and bug reports  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright © 2003   by The Free Software Foundation     Updated Jun 2003  

Please take a moment to fill out this visitor survey
You can help support this site by visiting the advertisers that sponsor it! (only once each, though)