www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2001/02/21/10:39:34

From: "Juan Manuel Guerrero" <ST001906 AT HRZ1 DOT HRZ DOT TU-Darmstadt DOT De>
Organization: Darmstadt University of Technology
To: djgpp-workers AT delorie DOT com
Date: Wed, 21 Feb 2001 16:37:57 +0200
MIME-Version: 1.0
Subject: Re: gettext pretest available
CC: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>, Bruno Haible <haible AT ilog DOT fr>
X-mailer: Pegasus Mail for Windows (v2.54DE)
Message-ID: <29F40F30739@HRZ1.hrz.tu-darmstadt.de>
Reply-To: djgpp-workers AT delorie DOT com

Excuse the delay but I was busy.
Today I have send to DJ Delorie alpha ports of Bruno Haible's libiconv-1.5.1
and GNU gettext-2001-02-05. They will *appear* on Simtel.Net as:
  <ftp://ftp.Simtel.Net/pub/simtelnet/gnu/djgpp/v2gnu/alphas/licv151b.zip>
  <ftp://ftp.Simtel.Net/pub/simtelnet/gnu/djgpp/v2gnu/alphas/licv151s.zip>
  <ftp://ftp.Simtel.Net/pub/simtelnet/gnu/djgpp/v2gnu/alphas/gtxt036b.zip>
  <ftp://ftp.Simtel.Net/pub/simtelnet/gnu/djgpp/v2gnu/alphas/gtxt036s.zip>
They are fully functional but they are intended only for inspection by the
djgpp-workers audience. Patch and port is based on gettext-2001-02-05.tgz.

Here is the patch that will allow compile GNU gettext out-of-the-box with DJGPP.
I will assume that Bruno Haible is not completely familiar with MSDOS/DJGPP
so I will try to be a little bit more explicit than usual.

When porting unix software to dos there are some points that must be considered:
- DOS distinguish between binary and text files;
- DOS use both `/' and `\\' as directory separator in file names;
- DOS can have a drive letter X: prepended to a file name;
- DOS have a separate root directory on each drive;
- directories in environment variables (like PATH) are separated
  by `;' rather than `:';
Apart of this general points, the DJGPP port of GNU gettext shall be able
to process .po and other text files properly, regardless if they have been
created on DOS **or** on UNIX.
DJGPP software can be cross compiled from linux to msdos/djgpp so gettext
must be able to process files with dos-style EOL and unix-style EOL properly.
To achive this purpose *all* files will be read and written in binary mode.
Afortunately Bruno has already implemented O_TEXT/O_BINARY support to some
extent in the sources. To implement all the above points, I have replaced,
that is modified some of the macros that Bruno has put into lib/system.h.
This is the DJGPP relevant snippet I have added to lib/system.h based on
Bruno's one:

  #include <fcntl.h>
  /* For systems that distinguish between text and binary I/O.
     O_BINARY is usually declared in <fcntl.h>. */
  #if !defined O_BINARY && defined _O_BINARY
    /* For MSC-compatible compilers.  */
  # define O_BINARY  _O_BINARY
  # define O_TEXT    _O_TEXT
  #endif
  #ifdef __BEOS__
    /* BeOS 5 has O_BINARY and O_TEXT, but they have no effect.  */
  # undef O_BINARY
  # undef O_TEXT
  #endif
  #if O_BINARY
  /* setmode() is usually defined in <io.h>. */
  # include <io.h>
  # ifdef HAVE_UNISTD_H
  /* isatty() is defined in <unistd.h>. */
  #  include <unistd.h>
  # endif
  # if !(defined(__EMX__) || defined(__DJGPP__))
  #  define setmode _setmode
  #  define fileno  _fileno
  # endif
  # ifdef __DJGPP__
  /* DJGPP will always read and write all files in binary mode. */
  #  define OPEN_RDONLY           O_RDONLY|O_BINARY
  #  define OPEN_WRONLY           O_WRONLY|O_BINARY
  #  define READ                  "rb"
  #  define WRITE                 "wb"
  #  define OPENED_IN_BINARY_MODE 1
  /* DJGPP's implementation of basename()
     knows about all the DOS peculiarities. */
  #  undef  basename
  #  define basename basename
  # else /* not __DJGPP__ */
  #  define OPEN_RDONLY           O_RDONLY
  #  define OPEN_WRONLY           O_WRONLY
  #  define READ                  "r"
  #  define WRITE                 "w"
  #  define OPENED_IN_BINARY_MODE 0
  # endif /* not __DJGPP__ */
  # define IS_DIR_SEPARATOR(path) (((path)[0]) == '/' || ((path)[0]) == '\\')
  # define IS_DEVICE(path)        (((path)[0]) && ((path)[1]) == ':')
  # define IS_ABSOLUTE_PATH(path) (IS_DIR_SEPARATOR(path) || IS_DEVICE(path))
  # define PATH_SEPARATOR         ';'
  #else /* not O_BINARY */
  # define setmode(fd, mode)      /* nothing */
  # define OPEN_RDONLY            O_RDONLY
  # define OPEN_WRONLY            O_WRONLY
  # define READ                   "r"
  # define WRITE                  "w"
  # define IS_ABSOLUTE_PATH(path) (((path)[0]) == '/')
  # define PATH_SEPARATOR         ':'
  # define OPENED_IN_BINARY_MODE  0
  #endif /* not O_BINARY */

The above is only FYI, please inspect the patch.
Because not all WinDos compilers want to open all files in binary mode
I have introduced the READ and WRITE macros. For DJGPP they will be set
to "rb" and "wb", for all other compilers, they will be set to "r" and "w".
Please note that this new macros are used only with those fopen() functions
that really open the file with "r". I have left *unchanged* all fopen() that
explicitly use "rb". The same is true for write access. All this means that
the use of READ/WRITE will not interfer with the way the binaries work if
they have been compiled with something different than DJGPP. Once again: only
DJGPP produced binaries will open all files in binary mode. All other products
will behave in their usual way. To account for the fact that we may get text
files with mixed dos-style and unix-style EOL, some code must be added to
check for this. This means that the lexer will recognize '\r\n' and '\n' as
valid EOL; if '\n' is striped then '\r\n' will also be stripped, etc. This
DJGPP specific code ist guarded by the OPENED_IN_BINARY_MODE macro. Please
inspect the patch.
Absolute DOS paths may start with backslash, slash or a drive letter followed
by a colon. This is handled by IS_ABSOLUTE_PATH(). PATH_SEPARATOR is used to
select the OS specific path separator. GNU gettext uses a basename()
implementation from glibc. It is not worth to try to port this to dos. I have
replaced this call by a call to DJGPP's own basename().
In some places of the  sources, setmode() is used. To get the function
definition, I have added <io.h>. Please note that I have only Borland 4.0 and
MSC 5.10 available to check if io.h is really in the standard include directory.
If this should not be true for other compilers, this must be changed accordingly.
On WinDos, switching unconditionally stdin/stdout into binary mode is a dangerous
issue **if** stdin is still connected to the console. This switching inhibits
Cntl-Z (software EOF), Cntl-C and Cntl-Break generation making it impossible
to the user to interrupt the programm. I have replaced all occurences of
  setmode(fileno(stdin),O_BINARY)
by
  if(!isatty(fileno(stdin)))
    setmode(fileno(stdin),O_BINARY);
This is the usual way to treat this issue. Of course, the stdout case has been
treated accordingly.

Unfortunately there is a name conflict between GNU's gettext() and DJGPP's
Borland-compatibility gettext() defined in conio.h. To resolve this
conflict, I have added the following snippet to intl/libgnuintl.h:

#ifdef __DJGPP__
/* This will remove the conflict between the gettext function
   from libintl.h and DJGPP's gettext function from conio.h.
   GNU gettext takes *always* precedence over DJGPP's _conio_gettext. */
# undef   gettext
# define  gettext gettext
# define  __LIBINTL_H_INCLUDED__
#endif /* not __DJGPP__ */

At the same time DJGPP's gettext() function will be renamed in _conio_gettext
and the following snippet will be added to DJGPP's conio.h:
/* This is to resolve the name clash between
   gettext from conio.h and gettext from libintl.h.
   IMPORTANT:
   If both headers are included, gettext from libintl.h
   takes ALWAYS precedence over gettext from conio.h. */
#ifndef __LIBINTL_H_INCLUDED__
# undef  gettext
# define gettext _conio_gettext
#endif

Please note that I will send the patch for the CVS-tree in a separate mail.
The goal is to cause as less impact as possible in existing sources. The user
who uses GNU gettext will have no difficulty in using libintl.a because there
will be no more name collision at compiling and linking phase. The user who
needs Borland-compatibility gettext can continue using the gettext keyword
as long as he does *not* include <libintl.h>. The user will see no difference
compared with previous versions of DJGPP's libc.a. The user who uses both
functions in the same source file will have to use _conio_gettext to access
the Borland-compatibility function because the gettext() keyword is now
*reserved* for GNU gettext. Once again:
1) #include <libintl.h>
   gettext keyword makes reference to GNU gettext().
2) #include <conio.h>
   gettext keyword makes reference to DJGPP's Borland-compatibility gettext().
3) #include <libintl.h>
   #include <conio.h>
   gettext keyword is *always* reserved for GNU gettext(). 
   For DJGPP's Borland-compatibility gettext() the new name _conio_gettext
   must be used. Of course, the including sequence of the headers will not
   matter.

To use the on-the-fly recoding functionality provided by libiconv-1.5.1
the files: intl/config.charset and intl/localcharset.c must been modified.
I want to add the following snippet to intl/config.charset:
      *msdosdjgpp*)
  	# DJGPP 2.03 doesn't have nl_langinfo(CODESET); therefore
  	# localcharset.c falls back to using the full locale name
  	# from the environment variables.
  	echo "C        CP437"
  	echo "US-ASCII CP437"
  	echo "en_US    CP437"  # ISO-8859-1
  
  	for l in ca_ES de_AT de_CH de_DE en_AU en_CA en_GB en_ZA eo_EO \
                   es_ES es_AR es_BO es_CL es_CO es_CR es_CU es_DO es_EC \
                   es_SV es_GT es_HN es_MX es_NI es_PA es_PY es_PE es_UY \
                   es_VE eu_ES gl_ES et_EE fi_FI fr_BE fr_CA fr_CH fr_FR \
                   ga_IE gd_GB id_ID it_CH it_IT la_LN mt_MT nl_BE nl_NL \
                   pt_BR pt_PT sv_SE \
                   ca de en es eu eo et fi fr ga gd gl id it la mt nl pt \
                   sv; do
  	  echo "$l    CP850"  # ISO-8859-1
  	done
  	for l in cs_CZ hr_HR hu_HU la_LN pl_PL ro_RO sh_YU sk_SK sl_SI \
                   sq_AL \
                   cs hr hu la pl ro sh sk sl sq; do
  	  echo "$l    CP852"  # ISO-8859-2
  	done
  	for l in tr_TR tr; do
  	  echo "$l    CP857"  # ISO-8859-9
  	done
  	for l in is_IS is; do
  	  echo "$l    CP861"  # ISO-8859-10
  	done
  	for l in he_IL he; do
  	  echo "$l    CP862"  # ISO-8859-8
  	done
  	for l in ar_DZ ar_EG ar_IR ar_IQ ar_JO ar_KW ar_MA ar_OM ar_QA \
                   ar_SA ar_SY ar_AE ar; do
  	  echo "$l    CP864"  # ISO-8859-6
  	done
  	for l in da_DK nb_NO nn_NO no_NO \
                   da nb nn no; do
  	  echo "$l    CP865"  # ISO-8859-1
  	done
  	for l in be_BE bg_BG mk_MK sr_YU be bg mk sr; do
  	  echo "$l    CP866"  # ISO-8859-5
  	done
  	for l in eo_GR eo; do
  	  echo "$l    CP869"  # ISO-8859-7
  	done
  	for l in th_TH th; do
  	  echo "$l    CP874"  # TIS-620
  	done
  	for l in ru_RU ru_SU ru; do
  	  echo "$l    CP878"  # KOI8-R
  	done
  	for l in ja_JP ja; do
  	  echo "$l    CP932"  # Shift-JIS
  	done
  	for l in zh_CN; do
  	  echo "$l    CP936"  # GBK/EUC-CN
  	done
  	for l in kr_KR kr; do
  	  echo "$l    CP949"  # EUC-KR
  	done
  	for l in zh_TW; do
  	  echo "$l    CP950"  # BIG5
  	done
  	;;
The order is ascending codepage order.
The above part of the shell script will create a file called charset.alias
that will be installed in $DJDIR/lib and will look like this:

  # This file contains a table of character encoding aliases,
  # suitable for operating system 'msdosdjgpp'.
  # It was automatically generated from config.charset.
  # Packages using this file: gettext 
  C        CP437
  US-ASCII CP437
  en_US    CP437
  ca_ES    CP850
  de_AT    CP850
  de_CH    CP850
  de_DE    CP850
  en_AU    CP850
  en_CA    CP850
  en_GB    CP850
  en_ZA    CP850
  [SNIP]

If the environment variable LANG is set to one of the values contained in the
file the .mo file will be recoded to the corresponding dos codepage on the fly.
Possible values are LL or LL_CC (LL=language code, CC= country code). Please
note that GNU gettext-0.10.35 uses LANGUAGE instead. This has changed in
gettext-0.10.36. Now LANG *must* be used to set the LL and LANGUAGE may be
used to change this selection. Example:
  LANG=de
  LANGUAGE=es:de
This will select the spanish translations, if not found the german ones will
be used. If in the above example LANG is omitted gettext will default to "C",
this means english messages will be printed using CP437 no matter if LANGUAGE
is set or not.
Please note, the shell script snippet is probably not complete. If you see that
I have forgotten your language, please notify me and tell me the right codepage
so it can be added to the list. Please also note that libiconv is *not* able to
recode every existing codepage. This may change in the future.
I have create the shell script snippet with the same info I have used to
create recodepo.sh, this is:
  MS-DOS 6.22 COUNTRY.TXT file
available from:
  <ftp://ftp.microsoft.com/peropsys/msdos/kb/q117/8/50.txt>
and:
  <http://www.cs.tu-berlin.de/~cyzborra/charsets/iso8859.html>
  <http://www.cs.tu-berlin.de/~cyzborra/charsets/iso646.html>
  <http://www.cs.tu-berlin.de/~cyzborra/charsets/codepages.html>
  <http://www.cs.tu-berlin.de/~cyzborra/charsets/cyrillic.html>
  <http://www.cs.tu-berlin.de/~cyzborra/charsets/cjk.html>

There are two modifications needed in intl/localcharset.c. First, charset.alias
will be opened in binary mode so the EOL issue matters. Second, locale_charset()
calls setlocale() and this must be inhibit because DJGPP's setlocale() only
supports 'C' and 'Posix'. As it can be seen above 'C' implies always CP437
and CP437 is useless for all languages exept US english. To this purpose
I have added the following snippet to intl/localcharset.c:
  #if ((__DJGPP__ == 2) && (__DJGPP_MINOR__ <= 3))
  /* DJGPP 2.03 and prior only supports C and POSIX. */
  # undef  HAVE_SETLOCALE
  # define HAVE_SETLOCALE 0
  #endif

This will inhibit the use of setlocale() and will allow the use of the
environment variables LC_ALL, LC_CTYPE and LANG.


To implement all this, the supplied patch will modify the following files:
  gettext-2001-02-05/intl/config.charset
  gettext-2001-02-05/intl/dcigettext.c
  gettext-2001-02-05/intl/gettextP.h
  gettext-2001-02-05/intl/libgnuintl.h
  gettext-2001-02-05/intl/localcharset.c
  gettext-2001-02-05/intl/localealias.c
  gettext-2001-02-05/lib/system.h
  gettext-2001-02-05/src/Makefile.am
  gettext-2001-02-05/src/message.c
  gettext-2001-02-05/src/msgcomm.c
  gettext-2001-02-05/src/msgfmt.c
  gettext-2001-02-05/src/msgunfmt.c
  gettext-2001-02-05/src/open-po.c
  gettext-2001-02-05/src/po-lex.c
  gettext-2001-02-05/src/xget-lex.c
  gettext-2001-02-05/src/xgettext.c
  gettext-2001-02-05/tests/Makefile.am

It will also add the DJGPP specific files:
  gettext-2001-02-05/djgpp/config.bat
  gettext-2001-02-05/djgpp/config.sed
  gettext-2001-02-05/djgpp/config.site
  gettext-2001-02-05/djgpp/edtest.bat
  gettext-2001-02-05/djgpp/fnchange.lst
  gettext-2001-02-05/djgpp/readme
  gettext-2001-02-05/djgpp/tscript.sed

All this files are needed to configure this package on WinDos.
fnchange.lst contains a list of files that will be renamed on-the-fly
by djtar during package unzipping/untaring. This will guarantee that
the sources can be configured, compiled und checked on plain DOS.
The files that do not resolve to a valid short file name are all from the
testsuit. Also the files created by the testsuit scripts are not 8.3 clean.
They create filenames with more than one dot. I do not think it is worth to
change the testsuit to remove this name conflicts. The DJGPP user will
configure this package running config.bat. config.bat will call edtest.bat
and this batch file will "patch" on-the-fly the testsuit so that it will
work on plain DOS. All this has already been deviced for gettext-0.10.35
and is reused here.
It should be noticed that the file m4/gettext.m4 contains the following line:
     [case " $CONFIG_FILES " in *" po/Makefile.in "*)
I am not really familiar with Autoconf stuff so I will *not* claim that the
above line is a bug but it should be noticed that the pattern:
  *" po/Makefile.in "*
will inhibit the use of the expression:
  po/Makefile:po/Makefile.in-in
on all systems that depends on this possibility.
Once again: Makefile.in.in is not a legal DOS name and **all** DJGPP ports,
not only this one, depends on the possibility of renaming Makefile.in.in
to Makefile.in-in. Anyway, config.bat will repair this for DJGPP.
Because I do *not* know if the above line is a bug or not, the supplied patch
will ***not*** modify m4/gettext.m4 at all.
config.bat will also modify one rule from the src/Makefile.
This rule is:

po-gram-gen2.h: po-gram-gen.h
	$(SED) 's/[yY][yY]/po_gram_/g' $(srcdir)/po-gram-gen.h > $@-tmp
	mv $@-tmp $@

It can be seen that po-gram-gen2.h and po-gram-gen.h resolve to the same
short file name.

In conclusion: with the patch supplied here, gettext can be configured,
compiled, tested, installed and deinstalled (to some extend) out of the box.
on plain DOS (SFN) and Win9X (LFN).
I have successfully reconfigured, recompiled, tested and installed all
existing DJGPP that offers NLS. The recoding on-the-fly of the installed
.mo files works ok.
Neitherless it should be realized that a working libiconv.a must **always**
be installed. The use of this gettext port together with libiconv breaks
all existing configure scripts from the DJGPP ports. This configure scripts
checks for gettext() in libintl.a, for bindtextdomain() and for dcgettext().
All this checks are performed by compiling a test programm that is linked
with libintl.a. But this port of libintl.a has references to libiconv.a
and this library is not linked to the binary at all. All this implies
that the linkage phase of all test binaries fail. The reason is that the
actualy existing gettext.m4 macro generates link commands like this:
  -lintl
instead of this one:
  -lintl -liconv
if libiconv.a has been found on the system. For all existing DJGPP
ports I will provide modified config.sed files to fix this as soon as
gettext-0.10.36 is officially released. Then I will create the DJGPP
packages for the djgpp users that will contain this modified versions
of config.sed and the changed conio.h and conio.o files.

The patch with a small changelog follows in the next mail.
I will submit separate patches for libc and libiconv-1.5.1.
As usual, comments, suggestions, objections, questions are welcome.

Regards,
Guerrero, Juan Manuel

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019