www.delorie.com/gnu/docs/glibc/libc_106.html   search  
 
Buy the book!


The GNU C Library

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.5.4.3 iconv module data structures

So far this section has described how modules are located and considered to be used. What remains to be described is the interface of the modules so that one can write new ones. This section describes the interface as it is in use in January 1999. The interface will change a bit in the future but, with luck, only in an upwardly compatible way.

The definitions necessary to write new modules are publicly available in the non-standard header `gconv.h'. The following text, therefore, describes the definitions from this header file. First, however, it is necessary to get an overview.

From the perspective of the user of iconv the interface is quite simple: the iconv_open function returns a handle that can be used in calls to iconv, and finally the handle is freed with a call to iconv_close. The problem is that the handle has to be able to represent the possibly long sequences of conversion steps and also the state of each conversion since the handle is all that is passed to the iconv function. Therefore, the data structures are really the elements necessary to understanding the implementation.

We need two different kinds of data structures. The first describes the conversion and the second describes the state etc. There are really two type definitions like this in `gconv.h'.

Data type: struct __gconv_step
This data structure describes one conversion a module can perform. For each function in a loaded module with conversion functions there is exactly one object of this type. This object is shared by all users of the conversion (i.e., this object does not contain any information corresponding to an actual conversion; it only describes the conversion itself).

struct __gconv_loaded_object *__shlib_handle
const char *__modname
int __counter
All these elements of the structure are used internally in the C library to coordinate loading and unloading the shared. One must not expect any of the other elements to be available or initialized.

const char *__from_name
const char *__to_name
__from_name and __to_name contain the names of the source and destination character sets. They can be used to identify the actual conversion to be carried out since one module might implement conversions for more than one character set and/or direction.

gconv_fct __fct
gconv_init_fct __init_fct
gconv_end_fct __end_fct
These elements contain pointers to the functions in the loadable module. The interface will be explained below.

int __min_needed_from
int __max_needed_from
int __min_needed_to
int __max_needed_to;
These values have to be supplied in the init function of the module. The __min_needed_from value specifies how many bytes a character of the source character set at least needs. The __max_needed_from specifies the maximum value that also includes possible shift sequences.

The __min_needed_to and __max_needed_to values serve the same purpose as __min_needed_from and __max_needed_from but this time for the destination character set.

It is crucial that these values be accurate since otherwise the conversion functions will have problems or not work at all.

int __stateful
This element must also be initialized by the init function. int __stateful is nonzero if the source character set is stateful. Otherwise it is zero.

void *__data
This element can be used freely by the conversion functions in the module. void *__data can be used to communicate extra information from one call to another. void *__data need not be initialized if not needed at all. If void *__data element is assigned a pointer to dynamically allocated memory (presumably in the init function) it has to be made sure that the end function deallocates the memory. Otherwise the application will leak memory.

It is important to be aware that this data structure is shared by all users of this specification conversion and therefore the __data element must not contain data specific to one specific use of the conversion function.

Data type: struct __gconv_step_data
This is the data structure that contains the information specific to each use of the conversion functions.

char *__outbuf
char *__outbufend
These elements specify the output buffer for the conversion step. The __outbuf element points to the beginning of the buffer, and __outbufend points to the byte following the last byte in the buffer. The conversion function must not assume anything about the size of the buffer but it can be safely assumed the there is room for at least one complete character in the output buffer.

Once the conversion is finished, if the conversion is the last step, the __outbuf element must be modified to point after the last byte written into the buffer to signal how much output is available. If this conversion step is not the last one, the element must not be modified. The __outbufend element must not be modified.

int __is_last
This element is nonzero if this conversion step is the last one. This information is necessary for the recursion. See the description of the conversion function internals below. This element must never be modified.

int __invocation_counter
The conversion function can use this element to see how many calls of the conversion function already happened. Some character sets require a certain prolog when generating output, and by comparing this value with zero, one can find out whether it is the first call and whether, therefore, the prolog should be emitted. This element must never be modified.

int __internal_use
This element is another one rarely used but needed in certain situations. It is assigned a nonzero value in case the conversion functions are used to implement mbsrtowcs et.al. (i.e., the function is not used directly through the iconv interface).

This sometimes makes a difference as it is expected that the iconv functions are used to translate entire texts while the mbsrtowcs functions are normally used only to convert single strings and might be used multiple times to convert entire texts.

But in this situation we would have problem complying with some rules of the character set specification. Some character sets require a prolog, which must appear exactly once for an entire text. If a number of mbsrtowcs calls are used to convert the text, only the first call must add the prolog. However, because there is no communication between the different calls of mbsrtowcs, the conversion functions have no possibility to find this out. The situation is different for sequences of iconv calls since the handle allows access to the needed information.

The int __internal_use element is mostly used together with __invocation_counter as follows:

 
if (!data->__internal_use
     && data->__invocation_counter == 0)
  /* Emit prolog.  */
  ...

This element must never be modified.

mbstate_t *__statep
The __statep element points to an object of type mbstate_t (see section 6.3.2 Representing the state of the conversion). The conversion of a stateful character set must use the object pointed to by __statep to store information about the conversion state. The __statep element itself must never be modified.

mbstate_t __state
This element must never be used directly. It is only part of this structure to have the needed space allocated.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003