| www.delorie.com/gnu/docs/recode/recode_15.html | search |
![]() Buy GNU books! | |
recode reference manual| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
It is our experience that when recode does not provide satisfying
results, either recode was not called properly, correct results
raised some doubts nevertheless, or files to recode were somewhat mangled.
Genuine bugs are surely possible.
Unless you already are a recode expert, it might be a good idea to
quickly revisit the tutorial (see section 1. Quick Tutorial) or the prior sections in this
chapter, to make sure that you properly formatted your recoding request.
In the case you intended to use recode as a filter, make sure that you
did not forget to redirect your standard input (through using the <
symbol in the shell, say). Some recode false mysteries are also
easily explained, See section 3.5 Reversibility issues.
For the other cases, some investigation is needed. To illustrate how to
proceed, let's presume that you want to recode the `nicepage' file,
coded UTF-8, into HTML. The problem is that the command
`recode u8..h nicepage' yields:
recode: Invalid input in step `UTF-8..ISO-10646-UCS-2' |
One good trick is to use recode in filter mode instead of in file
replacement mode, See section 3.1 Synopsis of recode call. Another good trick is to use the
`-v' option asking for a verbose description of the recoding steps.
We could rewrite our recoding call as `recode -v u8..h <nicepage',
to get something like:
Request: UTF-8..:libiconv:..ISO-10646-UCS-2..HTML_4.0 Shrunk to: UTF-8..ISO-10646-UCS-2..HTML_4.0 [...some output...] recode: Invalid input in step `UTF-8..ISO-10646-UCS-2' |
This might help you to better understand what the diagnostic means. The
recoding request is achieved in two steps, the first recodes UTF-8
into UCS-2, the second recodes UCS-2 into HTML.
The problem occurs within the first of these two steps, and since, the
input of this step is the input file given to recode, this is
this overall input file which seems to be invalid. Also, when used in
filter mode, recode processes as much input as possible before the
error occurs and sends the result of this processing to standard output.
Since the standard output has not been redirected to a file, it is merely
displayed on the user screen. By inspecting near the end of the resulting
HTML output, that is, what was recoding a bit before the recoding
was interrupted, you may infer about where the error stands in the real
UTF-8 input file.
If you have the proper tools to examine the intermediate recoding data,
you might also prefer to reduce the problem to a single step to better
study it. This is what I usually do. For example, the last recode
call above is more or less equivalent to:
recode -v UTF-8..ISO_10646-UCS-2 <nicepage >temporary recode -v ISO_10646-UCS-2..HTML_4.0 <temporary rm temporary |
If you know that the problem is within the first step, you might prefer to
concentrate on using the first recode line. If you know that the
problem is within the second step, you might execute the first recode
line once and for all, and then play with the second recode call,
repeatedly using the `temporary' file created once by the first call.
Note that the `-f' switch may be used to force the production of
HTML output despite invalid input, it might be satisfying enough
for you, and easier than repairing the input file. That depends on how
strict you would like to be about the precision of the recoding process.
If you later see that your HTML file begins with `@lt;html@gt;' when
you expected `<html>', then recode might have done a bit more
that you wanted. In this case, your input file was half-UTF-8,
half-HTML already, that is, a mixed file (see section 3.7 Using mixed charset input). There is a
special -d switch for this case. So, your might be end up calling
`recode -fd nicepage'. Until you are quite sure that you accept
overwriting your input file whatever what, I recommend that you stick with
filter mode.
If, after such experiments, you seriously think that the recode
program does not behave properly, there might be a genuine bug in the
program itself, in which case I invite you to to contribute a bug report,
See section 2.3 Contributions and bug reports.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| webmaster donations bookstore | delorie software privacy |
| Copyright © 2003 by The Free Software Foundation | Updated Jun 2003 |