www.delorie.com/gnu/docs/recode/recode_64.html   search  
 
Buy GNU books!


The recode reference manual

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.1 Overall organisation

The recode mechanics slowly evolved for many years, and it would be tedious to explain all problems I met and mistakes I did all along, yielding the current behaviour. Surely, one of the key choices was to stop trying to do all conversions in memory, one line or one buffer at a time. It has been fruitful to use the character stream paradigm, and the elementary recoding steps now convert a whole stream to another. Most of the control complexity in recode exists so that each elementary recoding step stays simple, making easier to add new ones. The whole point of recode, as I see it, is providing a comfortable nest for growing new charset conversions.

The main recode driver constructs, while initialising all conversion modules, a table giving all the conversion routines available (single steps) and for each, the starting charset and the ending charset. If we consider these charsets as being the nodes of a directed graph, each single step may be considered as oriented arc from one node to the other. A cost is attributed to each arc: for example, a high penalty is given to single steps which are prone to losing characters, a lower penalty is given to those which need studying more than one input character for producing an output character, etc.

Given a starting code and a goal code, recode computes the most economical route through the elementary recodings, that is, the best sequence of conversions that will transform the input charset into the final charset. To speed up execution, recode looks for subsequences of conversions which are simple enough to be merged, and then dynamically creates new single steps to represent these mergings.

A double step in recode is a special concept representing a sequence of two single steps, the output of the first single step being the special charset UCS-2, the input of the second single step being also UCS-2. Special recode machinery dynamically produces efficient, reversible, merge-able single steps out of these double steps.

I made some statistics about how many internal recoding steps are required between any two charsets chosen at random. The initial recoding layout, before optimisation, always uses between 1 and 5 steps. Optimisation could sometimes produce mere copies, which are counted as no steps at all. In other cases, optimisation is unable to save any step. The number of steps after optimisation is currently between 0 and 5 steps. Of course, the expected number of steps is affected by optimisation: it drops from 2.8 to 1.8. This means that recode uses a theoretical average of a bit less than one step per recoding job. This looks good. This was computed using reversible recodings. In strict mode, optimisation might be defeated somewhat. Number of steps run between 1 and 6, both before and after optimisation, and the expected number of steps decreases by a lesser amount, going from 2.2 to 1.3. This is still manageable.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright © 2003   by The Free Software Foundation     Updated Jun 2003  

Please take a moment to fill out this visitor survey
You can help support this site by visiting the advertisers that sponsor it! (only once each, though)