www.delorie.com/gnu/docs/recode/recode_11.html   search  
 
Buy GNU books!


The recode reference manual

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.5 Reversibility issues

The following options are somewhat related to reversibility issues:

`-f'
`--force'
With this option, irreversible or otherwise erroneous recodings are run to completion, and recode does not exit with a non-zero status if it would be only because irreversibility matters. See section 3.5 Reversibility issues.

Without this option, recode tries to protect you against recoding a file irreversibly over itself(6). Whenever an irreversible recoding is met, or any other recoding error, recode produces a warning on standard error. The current input file does not get replaced by its recoded version, and recode then proceeds with the recoding of the next file.

When the program is merely used as a filter, standard output will have received a partially recoded copy of standard input, up to the first error point. After all recodings have been done or attempted, and if some recoding has been aborted, recode exits with a non-zero status.

In releases of recode prior to version 3.5, this option was always selected, so it was rather meaningless. Nevertheless, users were invited to start using `-f' right away in scripts calling recode whenever convenient, in preparation for the current behaviour.

`-q'
`--quiet'
`--silent'
This option has the sole purpose of inhibiting warning messages about irreversible recodings, and other such diagnostics. It has no other effect, in particular, it does not prevent recodings to be aborted or recode to return a non-zero exit status when irreversible recodings are met.

This option is set automatically for the children processes, when recode splits itself in many collaborating copies. Doing so, the diagnostic is issued only once by the parent. See option `-p'.

`-s'
`--strict'
By using this option, the user requests that recode be very strict while recoding a file, merely losing in the transformation any character which is not explicitly mapped from a charset to another. Such a loss is not reversible and so, will bring recode to fail, unless the option `-f' is also given as a kind of counter-measure.

Using `-s' without `-f' might render the recode program very susceptible to the slighest file abnormalities. Despite the fact that it might be irritating to some users, such paranoia is sometimes wanted and useful.

Even if recode tries hard to keep the recodings reversible, you should not develop an unconditional confidence in its ability to do so. You ought to keep only reasonable expectations about reverse recodings. In particular, consider:

Unless option `-s' is used, recode automatically tries to fill mappings with invented correspondences, often making them fully reversible. This filling is not made at random. The algorithm tries to stick to the identity mapping and, when this is not possible, it prefers generating many small permutation cycles, each involving only a few codes.

For example, here is how IBM-PC code 186 gets translated to control-U in Latin-1. Control-U is 21. Code 21 is the IBM-PC section sign, which is 167 in Latin-1. recode cannot reciprocate 167 to 21, because 167 is the masculine ordinal indicator within IBM-PC, which is 186 in Latin-1. Code 186 within IBM-PC has no Latin-1 equivalent; by assigning it back to 21, recode closes this short permutation loop.

As a consequence of this map filling, recode may sometimes produce funny characters. They may look annoying, they are nevertheless helpful when one changes his (her) mind and wants to revert to the prior recoding. If you cannot stand these, use option `-s', which asks for a very strict recoding.

This map filling sometimes has a few surprising consequences, which some users wrongly interpreted as bugs. Here are two examples.

  1. In some cases, recode seems to copy a file without recoding it. But in fact, it does. Consider a request:

     
    recode l1..us < File-Latin1 > File-ASCII
    cmp File-Latin1 File-ASCII
    

    then cmp will not report any difference. This is quite normal. Latin-1 gets correctly recoded to ASCII for charsets commonalities (which are the first 128 characters, in this case). The remaining last 128 Latin-1 characters have no ASCII correspondent. Instead of losing them, recode elects to map them to unspecified characters of ASCII, so making the recoding reversible. The simplest way of achieving this is merely to keep those last 128 characters unchanged. The overall effect is copying the file verbatim.

    If you feel this behaviour is too generous and if you do not wish to care about reversibility, simply use option `-s'. By doing so, recode will strictly map only those Latin-1 characters which have an ASCII equivalent, and will merely drop those which do not. Then, there is more chance that you will observe a difference between the input and the output file.

  2. Recoding the wrong way could sometimes give the false impression that recoding has almost been done properly. Consider the requests:

     
    recode 437..l1 < File-Latin1 > Temp1
    recode 437..l1 < Temp1 > Temp2
    

    so declaring wrongly `File-Latin1' to be an IBM-PC file, and recoding to Latin-1. This is surely ill defined and not meaningful. Yet, if you repeat this step a second time, you might notice that many (not all) characters in `Temp2' are identical to those in `File-Latin1'. Sometimes, people try to discover how recode works by experimenting a little at random, rather than reading and understanding the documentation; results such as this are surely confusing, as they provide those people with a false feeling that they understood something.

    Reversible codings have this property that, if applied several times in the same direction, they will eventually bring any character back to its original value. Since recode seeks small permutation cycles when creating reversible codings, besides characters unchanged by the recoding, most permutation cycles will be of length 2, and fewer of length 3, etc. So, it is just expectable that applying the recoding twice in the same direction will recover most characters, but will fail to recover those participating in permutation cycles of length 3. On the other end, recoding six times in the same direction would recover all characters in cycles of length 1, 2, 3 or 6.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright © 2003   by The Free Software Foundation     Updated Jun 2003  

Please take a moment to fill out this visitor survey
You can help support this site by visiting the advertisers that sponsor it! (only once each, though)