www.delorie.com/gnu/docs/recode/recode_fot.html   search  
Buy GNU books!

The recode reference manual

[Top] [Contents] [Index] [ ? ]



I'm not prone at accepting a charset you just invented, and which nobody uses yet: convince your friends and community first!


In previous versions or recode, a single colon `:' was used instead of the two dots `..' for separating charsets, but this was creating problems because colons are allowed in official charset names. The old request syntax is still recognised for compatibility purposes, but is deprecated.


More precisely, pc is an alias for the charset IBM-PC.


Both before and after may be omitted, in which case the double dot separator is mandatory. This is not very useful, as the recoding reduces to a mere copy in that case.


MS-DOS is one of those systems for which the default charset has implied surfaces, CR-LF here. Such surfaces are automatically removed or applied whenever the default charset is read or written, exactly as it would go for any other charset. In the example above, on such systems, the hexadecimal surface would then replace the implied surfaces. For adding an hexadecimal surface without removing any, one should write the request as `/../x'.


There are still some cases of ambiguous output which are rather difficult to detect, and for which the protection is not active.


The minimality of an UTF-8 encoding is guaranteed on output, but currently, it is not checked on input.


Another approach would have been to define the level symbols as masks instead, and to give masks to threshold setting routines, and to retain all errors--yet I never met myself such a need in practice, and so I fear it would be overkill. On the other hand, it might be interesting to maintain counters about how many times each kind of error occurred.


It is not probable that recode will ever support UTF-1.


This is when the goal charset allows for 16-bits. For shorter charsets, the `--strict' (`-s') option decides what happens: either the character is dropped, or a reversible mapping is produced on the fly.


On DOS/Windows, stock shells do not know that apostrophes quote special characters like |, so one need to use double quotes instead of apostrophes.


This convention replaced an older one saying that up to 4 immediately preceeding pairs of zero bytes, going backward, are to be considered as part of the end of line and not interpreted as ::.


There are supposed to be seven words in this case. So, one is missing.


Look at one of the following sentences (the second has to be interpreted with the `-c' option):

"Ai"e!  Voici le proble`me que j'ai"
Ai:e!  Voici le proble`me que j'ai:

There is an ambiguity between an ai", the small animal, and the indicative future of avoir (first person singular), when followed by what could be a diaeresis mark. Hopefully, the case is solved by the fact that an apostrophe always precedes the verb and almost never the animal.


I did not pay attention to proper nouns, but this one showed up as being fairly evident.


Usually, quail means quail egg in Japanese, while egg alone is usually chicken egg. Both quail egg and chicken egg are popular food in Japan. The quail input system has been named because it is smaller that the previous EGG system. As for EGG, it is the translation of TAMAGO. This word comes from the Japanese sentence takusan matasete gomennasai, meaning sorry to have let you wait so long. Of course, the publication of EGG has been delayed many times... (Story by Takahashi Naoto)


These are mere examples to explain the concept, recode only has Base64 and CR-LF, actually.


If strict mapping is requested, another efficient device will be used instead of a permutation.

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003