| www.delorie.com/gnu/docs/recode/recode_64.html | search |
![]() Buy GNU books! | |
recode reference manual| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The recode mechanics slowly evolved for many years, and it
would be tedious to explain all problems I met and mistakes I did all
along, yielding the current behaviour. Surely, one of the key choices
was to stop trying to do all conversions in memory, one line or one
buffer at a time. It has been fruitful to use the character stream
paradigm, and the elementary recoding steps now convert a whole stream
to another. Most of the control complexity in recode exists
so that each elementary recoding step stays simple, making easier
to add new ones. The whole point of recode, as I see it, is
providing a comfortable nest for growing new charset conversions.
The main recode driver constructs, while initialising all
conversion modules, a table giving all the conversion routines
available (single steps) and for each, the starting charset and
the ending charset. If we consider these charsets as being the nodes
of a directed graph, each single step may be considered as oriented
arc from one node to the other. A cost is attributed to each arc:
for example, a high penalty is given to single steps which are prone
to losing characters, a lower penalty is given to those which need
studying more than one input character for producing an output
character, etc.
Given a starting code and a goal code, recode computes the most
economical route through the elementary recodings, that is, the best
sequence of conversions that will transform the input charset into the
final charset. To speed up execution, recode looks for
subsequences of conversions which are simple enough to be merged, and
then dynamically creates new single steps to represent these mergings.
A double step in recode is a special concept representing a
sequence of two single steps, the output of the first single step being the
special charset UCS-2, the input of the second single step being
also UCS-2. Special recode machinery dynamically produces
efficient, reversible, merge-able single steps out of these double steps.
I made some statistics about how many internal recoding steps are required
between any two charsets chosen at random. The initial recoding layout,
before optimisation, always uses between 1 and 5 steps. Optimisation could
sometimes produce mere copies, which are counted as no steps at all.
In other cases, optimisation is unable to save any step. The number of
steps after optimisation is currently between 0 and 5 steps. Of course,
the expected number of steps is affected by optimisation: it drops
from 2.8 to 1.8. This means that recode uses a theoretical average
of a bit less than one step per recoding job. This looks good. This was
computed using reversible recodings. In strict mode, optimisation might
be defeated somewhat. Number of steps run between 1 and 6, both before
and after optimisation, and the expected number of steps decreases by a
lesser amount, going from 2.2 to 1.3. This is still manageable.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| webmaster donations bookstore | delorie software privacy |
| Copyright © 2003 by The Free Software Foundation | Updated Jun 2003 |