www.delorie.com/gnu/docs/recode/recode_65.html   search  
 
Buy GNU books!


The recode reference manual

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.2 Adding new charsets

The main part of recode is written in C, as are most single steps. A few single steps need to recognise sequences of multiple characters, they are often better written in Flex. It is easy for a programmer to add a new charset to recode. All it requires is making a few functions kept in a single `.c' file, adjusting `Makefile.am' and remaking recode.

One of the function should convert from any previous charset to the new one. Any previous charset will do, but try to select it so you will not lose too much information while converting. The other function should convert from the new charset to any older one. You do not have to select the same old charset than what you selected for the previous routine. Once again, select any charset for which you will not lose too much information while converting.

If, for any of these two functions, you have to read multiple bytes of the old charset before recognising the character to produce, you might prefer programming it in Flex in a separate `.l' file. Prototype your C or Flex files after one of those which exist already, so to keep the sources uniform. Besides, at make time, all `.l' files are automatically merged into a single big one by the script `mergelex.awk'.

There are a few hidden rules about how to write new recode modules, for allowing the automatic creation of `decsteps.h' and `initsteps.h' at make time, or the proper merging of all Flex files. Mimetism is a simple approach which relieves me of explaining all these rules! Start with a module closely resembling what you intend to do. Here is some advice for picking up a model. First decide if your new charset module is to be be driven by algorithms rather than by tables. For algorithmic recodings, see `iconqnx.c' for C code, or `txtelat1.l' for Flex code. For table driven recodings, see `ebcdic.c' for one-to-one style recodings, `lat1html.c' for one-to-many style recodings, or `atarist.c' for double-step style recodings. Just select an example from the style that better fits your application.

Each of your source files should have its own initialisation function, named module_charset, which is meant to be executed quickly once, prior to any recoding. It should declare the name of your charsets and the single steps (or elementary recodings) you provide, by calling declare_step one or more times. Besides the charset names, declare_step expects a description of the recoding quality (see `recodext.h') and two functions you also provide.

The first such function has the purpose of allocating structures, pre-conditioning conversion tables, etc. It is also the way of further modifying the STEP structure. This function is executed if and only if the single step is retained in an actual recoding sequence. If you do not need such delayed initialisation, merely use NULL for the function argument.

The second function executes the elementary recoding on a whole file. There are a few cases when you can spare writing this function:

If you have a recoding table handy in a suitable format but do not use one of the predefined recoding functions, it is still a good idea to use a delayed initialisation to save it anyway, because recode option `-h' will take advantage of this information when available.

Finally, edit `Makefile.am' to add the source file name of your routines to the C_STEPS or L_STEPS macro definition, depending on the fact your routines is written in C or in Flex.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright © 2003   by The Free Software Foundation     Updated Jun 2003  

Please take a moment to fill out this visitor survey
You can help support this site by visiting the advertisers that sponsor it! (only once each, though)