| www.delorie.com/gnu/docs/recode/recode_2.html | search |
![]() Buy GNU books! | |
recode reference manual| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A few terms are used over and over in this manual, our wise reader will
learn their meaning right away. Both ISO (International Organization for
Standardisation) and IETF (Internet Engineering Task Force) have their
own terminology, this document does not try to stick to either one in a
strict way, while it does not want to throw more confusion in the field.
On the other hand, it would not be efficient using paraphrases all the time,
so recode coins a few short words, which are explained below.
A charset, in the context of recode, is a particular association
between computer codes on one side, and a repertoire of intended characters
on the other side. Codes are usually taken from a set of consecutive
small integers, starting at 0. Some characters have a graphical appearance
(glyph) or displayable effect, others have special uses like, for example,
to control devices or to interact with neighbouring codes to specify them
more precisely. So, a charset is roughly one of those tables,
giving a meaning to each of the codes from the set of allowable values.
MIME also uses the term charset with approximately the same meaning.
It does not exactly corresponds to what ISO calls a coded
character set, that is, a set of characters with an encoding for them.
An coded character set does not necessarily use all available code positions,
while a MIME charset usually tries to specify them all. A MIME charset
might be the union of a few disjoint coded character sets.
A surface is a term used in recode only, and is a short for
surface transformation of a charset stream. This is any kind of mapping,
usually reversible, which associates physical bits in some medium for
a stream of characters taken from one or more charsets (usually one).
A surface is a kind of varnish added over a charset so it fits in actual
bits and bytes. How end of lines are exactly encoded is not really
pertinent to the charset, and so, there is surface for end of lines.
Base64 is also a surface, as we may encode any charset in it.
Other examples would DES enciphering, or gzip compression
(even if recode does not offer them currently): these are ways to give
a real life to theoretical charsets. The trivial surface consists
into putting characters into fixed width little chunks of bits, usually
eight such bits per character. But things are not always that simple.
This recode library, and the program by that name, have the purpose
of converting files between various charsets and surfaces. When this
cannot be done in exact ways, as it is often the case, the program may
get rid of the offending characters or fall back on approximations.
This library recognises or produces around 175 such charsets under 500
names, and handle a dozen surfaces. Since it can convert each charset to
almost any other one, many thousands of different conversions are possible.
The recode program and library do not usually know how to split and
sort out textual and non-textual information which may be mixed in a single
input file. For example, there is no surface which currently addresses the
problem of how lines are blocked into physical records, when the blocking
information is added as binary markers or counters within files. So,
recode should be given textual streams which are rather pure.
This tool pays special attention to superimposition of diacritics for some French representations. This orientation is mostly historical, it does not impair the usefulness, generality or extensibility of the program. `recode' is both a French and English word. For those who pay attention to those things, the proper pronunciation is French (that is, `racud', with `a' like in `above', and `u' like in `cut').
The program recode has been written by François Pinard.
With time, it got to reuse works from other contributors, and notably,
those of Keld Simonsen and Bruno Haible.
2.1 Overview of charsets 2.2 Overview of surfaces 2.3 Contributions and bug reports
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| webmaster donations bookstore | delorie software privacy |
| Copyright © 2003 by The Free Software Foundation | Updated Jun 2003 |