| www.delorie.com/gnu/docs/recode/recode_11.html | search |
![]() Buy GNU books! | |
recode reference manual| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following options are somewhat related to reversibility issues:
recode does not exit with a non-zero status if
it would be only because irreversibility matters. See section 3.5 Reversibility issues.
Without this option, recode tries to protect you against recoding
a file irreversibly over itself(6). Whenever an irreversible recoding is
met, or any other recoding error, recode produces a warning on
standard error. The current input file does not get replaced by its
recoded version, and recode then proceeds with the recoding of
the next file.
When the program is merely used as a filter, standard output will have
received a partially recoded copy of standard input, up to the first
error point. After all recodings have been done or attempted, and if
some recoding has been aborted, recode exits with a non-zero status.
In releases of recode prior to version 3.5, this option was always
selected, so it was rather meaningless. Nevertheless, users were invited
to start using `-f' right away in scripts calling recode
whenever convenient, in preparation for the current behaviour.
recode to return a non-zero exit status when irreversible
recodings are met.
This option is set automatically for the children processes, when recode splits itself in many collaborating copies. Doing so, the diagnostic is issued only once by the parent. See option `-p'.
recode be very strict
while recoding a file, merely losing in the transformation any character
which is not explicitly mapped from a charset to another. Such a loss is
not reversible and so, will bring recode to fail, unless the option
`-f' is also given as a kind of counter-measure.
Using `-s' without `-f' might render the recode program
very susceptible to the slighest file abnormalities. Despite the fact
that it might be
irritating to some users, such paranoia is sometimes wanted and useful.
Even if recode tries hard to keep the recodings reversible,
you should not develop an unconditional confidence in its ability to
do so. You ought to keep only reasonable expectations about
reverse recodings. In particular, consider:
IBM-PC to Latin-1. End of lines are represented as
`\r\n' in IBM-PC and as `\n' in Latin-1. There
is no way by which a faulty IBM-PC file containing a `\n'
not preceded by `\r' be translated into a Latin-1 file, and
then back.
LaTeX charset file, the string `\^\i{}'
could be recoded back and forth through another charset and become
`\^{\i}'. Even if the resulting file is equivalent to the
original one, it is not identical.
Unless option `-s' is used, recode automatically tries to
fill mappings with invented correspondences, often making them fully
reversible. This filling is not made at random. The algorithm tries to
stick to the identity mapping and, when this is not possible, it prefers
generating many small permutation cycles, each involving only a few
codes.
For example, here is how IBM-PC code 186 gets translated to
control-U in Latin-1. Control-U is 21. Code 21 is the
IBM-PC section sign, which is 167 in Latin-1. recode
cannot reciprocate 167 to 21, because 167 is the masculine ordinal indicator
within IBM-PC, which is 186 in Latin-1. Code 186 within
IBM-PC has no Latin-1 equivalent; by assigning it back to 21,
recode closes this short permutation loop.
As a consequence of this map filling, recode may sometimes produce
funny characters. They may look annoying, they are nevertheless
helpful when one changes his (her) mind and wants to revert to the prior
recoding. If you cannot stand these, use option `-s', which asks
for a very strict recoding.
This map filling sometimes has a few surprising consequences, which some users wrongly interpreted as bugs. Here are two examples.
recode seems to copy a file without recoding it.
But in fact, it does. Consider a request:
recode l1..us < File-Latin1 > File-ASCII cmp File-Latin1 File-ASCII |
then cmp will not report any difference. This is quite normal.
Latin-1 gets correctly recoded to ASCII for charsets commonalities
(which are the first 128 characters, in this case). The remaining last
128 Latin-1 characters have no ASCII correspondent. Instead
of losing
them, recode elects to map them to unspecified characters of ASCII, so
making the recoding reversible. The simplest way of achieving this is
merely to keep those last 128 characters unchanged. The overall effect
is copying the file verbatim.
If you feel this behaviour is too generous and if you do not wish to
care about reversibility, simply use option `-s'. By doing so,
recode will strictly map only those Latin-1 characters
which have
an ASCII equivalent, and will merely drop those which do not. Then,
there is more chance that you will observe a difference between the
input and the output file.
recode 437..l1 < File-Latin1 > Temp1 recode 437..l1 < Temp1 > Temp2 |
so declaring wrongly `File-Latin1' to be an IBM-PC file, and
recoding to Latin-1. This is surely ill defined and not meaningful.
Yet, if you repeat this step a second time, you might notice that
many (not all) characters in `Temp2' are identical to those in
`File-Latin1'. Sometimes, people try to discover how recode
works by experimenting a little at random, rather than reading and
understanding the documentation; results such as this are surely confusing,
as they provide those people with a false feeling that they understood
something.
Reversible codings have this property that, if applied several times
in the same direction, they will eventually bring any character back
to its original value. Since recode seeks small permutation
cycles when creating reversible codings, besides characters unchanged
by the recoding, most permutation cycles will be of length 2, and
fewer of length 3, etc. So, it is just expectable that applying the
recoding twice in the same direction will recover most characters,
but will fail to recover those participating in permutation cycles of
length 3. On the other end, recoding six times in the same direction
would recover all characters in cycles of length 1, 2, 3 or 6.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| webmaster donations bookstore | delorie software privacy |
| Copyright © 2003 by The Free Software Foundation | Updated Jun 2003 |