From: "Juan Manuel Guerrero" Organization: Darmstadt University of Technology To: djgpp-workers AT delorie DOT com Date: Wed, 21 Feb 2001 16:37:57 +0200 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Re: gettext pretest available CC: Eli Zaretskii , Bruno Haible X-mailer: Pegasus Mail for Windows (v2.54DE) Message-ID: <29F40F30739@HRZ1.hrz.tu-darmstadt.de> Reply-To: djgpp-workers AT delorie DOT com Excuse the delay but I was busy. Today I have send to DJ Delorie alpha ports of Bruno Haible's libiconv-1.5.1 and GNU gettext-2001-02-05. They will *appear* on Simtel.Net as: They are fully functional but they are intended only for inspection by the djgpp-workers audience. Patch and port is based on gettext-2001-02-05.tgz. Here is the patch that will allow compile GNU gettext out-of-the-box with DJGPP. I will assume that Bruno Haible is not completely familiar with MSDOS/DJGPP so I will try to be a little bit more explicit than usual. When porting unix software to dos there are some points that must be considered: - DOS distinguish between binary and text files; - DOS use both `/' and `\\' as directory separator in file names; - DOS can have a drive letter X: prepended to a file name; - DOS have a separate root directory on each drive; - directories in environment variables (like PATH) are separated by `;' rather than `:'; Apart of this general points, the DJGPP port of GNU gettext shall be able to process .po and other text files properly, regardless if they have been created on DOS **or** on UNIX. DJGPP software can be cross compiled from linux to msdos/djgpp so gettext must be able to process files with dos-style EOL and unix-style EOL properly. To achive this purpose *all* files will be read and written in binary mode. Afortunately Bruno has already implemented O_TEXT/O_BINARY support to some extent in the sources. To implement all the above points, I have replaced, that is modified some of the macros that Bruno has put into lib/system.h. This is the DJGPP relevant snippet I have added to lib/system.h based on Bruno's one: #include /* For systems that distinguish between text and binary I/O. O_BINARY is usually declared in . */ #if !defined O_BINARY && defined _O_BINARY /* For MSC-compatible compilers. */ # define O_BINARY _O_BINARY # define O_TEXT _O_TEXT #endif #ifdef __BEOS__ /* BeOS 5 has O_BINARY and O_TEXT, but they have no effect. */ # undef O_BINARY # undef O_TEXT #endif #if O_BINARY /* setmode() is usually defined in . */ # include # ifdef HAVE_UNISTD_H /* isatty() is defined in . */ # include # endif # if !(defined(__EMX__) || defined(__DJGPP__)) # define setmode _setmode # define fileno _fileno # endif # ifdef __DJGPP__ /* DJGPP will always read and write all files in binary mode. */ # define OPEN_RDONLY O_RDONLY|O_BINARY # define OPEN_WRONLY O_WRONLY|O_BINARY # define READ "rb" # define WRITE "wb" # define OPENED_IN_BINARY_MODE 1 /* DJGPP's implementation of basename() knows about all the DOS peculiarities. */ # undef basename # define basename basename # else /* not __DJGPP__ */ # define OPEN_RDONLY O_RDONLY # define OPEN_WRONLY O_WRONLY # define READ "r" # define WRITE "w" # define OPENED_IN_BINARY_MODE 0 # endif /* not __DJGPP__ */ # define IS_DIR_SEPARATOR(path) (((path)[0]) == '/' || ((path)[0]) == '\\') # define IS_DEVICE(path) (((path)[0]) && ((path)[1]) == ':') # define IS_ABSOLUTE_PATH(path) (IS_DIR_SEPARATOR(path) || IS_DEVICE(path)) # define PATH_SEPARATOR ';' #else /* not O_BINARY */ # define setmode(fd, mode) /* nothing */ # define OPEN_RDONLY O_RDONLY # define OPEN_WRONLY O_WRONLY # define READ "r" # define WRITE "w" # define IS_ABSOLUTE_PATH(path) (((path)[0]) == '/') # define PATH_SEPARATOR ':' # define OPENED_IN_BINARY_MODE 0 #endif /* not O_BINARY */ The above is only FYI, please inspect the patch. Because not all WinDos compilers want to open all files in binary mode I have introduced the READ and WRITE macros. For DJGPP they will be set to "rb" and "wb", for all other compilers, they will be set to "r" and "w". Please note that this new macros are used only with those fopen() functions that really open the file with "r". I have left *unchanged* all fopen() that explicitly use "rb". The same is true for write access. All this means that the use of READ/WRITE will not interfer with the way the binaries work if they have been compiled with something different than DJGPP. Once again: only DJGPP produced binaries will open all files in binary mode. All other products will behave in their usual way. To account for the fact that we may get text files with mixed dos-style and unix-style EOL, some code must be added to check for this. This means that the lexer will recognize '\r\n' and '\n' as valid EOL; if '\n' is striped then '\r\n' will also be stripped, etc. This DJGPP specific code ist guarded by the OPENED_IN_BINARY_MODE macro. Please inspect the patch. Absolute DOS paths may start with backslash, slash or a drive letter followed by a colon. This is handled by IS_ABSOLUTE_PATH(). PATH_SEPARATOR is used to select the OS specific path separator. GNU gettext uses a basename() implementation from glibc. It is not worth to try to port this to dos. I have replaced this call by a call to DJGPP's own basename(). In some places of the sources, setmode() is used. To get the function definition, I have added . Please note that I have only Borland 4.0 and MSC 5.10 available to check if io.h is really in the standard include directory. If this should not be true for other compilers, this must be changed accordingly. On WinDos, switching unconditionally stdin/stdout into binary mode is a dangerous issue **if** stdin is still connected to the console. This switching inhibits Cntl-Z (software EOF), Cntl-C and Cntl-Break generation making it impossible to the user to interrupt the programm. I have replaced all occurences of setmode(fileno(stdin),O_BINARY) by if(!isatty(fileno(stdin))) setmode(fileno(stdin),O_BINARY); This is the usual way to treat this issue. Of course, the stdout case has been treated accordingly. Unfortunately there is a name conflict between GNU's gettext() and DJGPP's Borland-compatibility gettext() defined in conio.h. To resolve this conflict, I have added the following snippet to intl/libgnuintl.h: #ifdef __DJGPP__ /* This will remove the conflict between the gettext function from libintl.h and DJGPP's gettext function from conio.h. GNU gettext takes *always* precedence over DJGPP's _conio_gettext. */ # undef gettext # define gettext gettext # define __LIBINTL_H_INCLUDED__ #endif /* not __DJGPP__ */ At the same time DJGPP's gettext() function will be renamed in _conio_gettext and the following snippet will be added to DJGPP's conio.h: /* This is to resolve the name clash between gettext from conio.h and gettext from libintl.h. IMPORTANT: If both headers are included, gettext from libintl.h takes ALWAYS precedence over gettext from conio.h. */ #ifndef __LIBINTL_H_INCLUDED__ # undef gettext # define gettext _conio_gettext #endif Please note that I will send the patch for the CVS-tree in a separate mail. The goal is to cause as less impact as possible in existing sources. The user who uses GNU gettext will have no difficulty in using libintl.a because there will be no more name collision at compiling and linking phase. The user who needs Borland-compatibility gettext can continue using the gettext keyword as long as he does *not* include . The user will see no difference compared with previous versions of DJGPP's libc.a. The user who uses both functions in the same source file will have to use _conio_gettext to access the Borland-compatibility function because the gettext() keyword is now *reserved* for GNU gettext. Once again: 1) #include gettext keyword makes reference to GNU gettext(). 2) #include gettext keyword makes reference to DJGPP's Borland-compatibility gettext(). 3) #include #include gettext keyword is *always* reserved for GNU gettext(). For DJGPP's Borland-compatibility gettext() the new name _conio_gettext must be used. Of course, the including sequence of the headers will not matter. To use the on-the-fly recoding functionality provided by libiconv-1.5.1 the files: intl/config.charset and intl/localcharset.c must been modified. I want to add the following snippet to intl/config.charset: *msdosdjgpp*) # DJGPP 2.03 doesn't have nl_langinfo(CODESET); therefore # localcharset.c falls back to using the full locale name # from the environment variables. echo "C CP437" echo "US-ASCII CP437" echo "en_US CP437" # ISO-8859-1 for l in ca_ES de_AT de_CH de_DE en_AU en_CA en_GB en_ZA eo_EO \ es_ES es_AR es_BO es_CL es_CO es_CR es_CU es_DO es_EC \ es_SV es_GT es_HN es_MX es_NI es_PA es_PY es_PE es_UY \ es_VE eu_ES gl_ES et_EE fi_FI fr_BE fr_CA fr_CH fr_FR \ ga_IE gd_GB id_ID it_CH it_IT la_LN mt_MT nl_BE nl_NL \ pt_BR pt_PT sv_SE \ ca de en es eu eo et fi fr ga gd gl id it la mt nl pt \ sv; do echo "$l CP850" # ISO-8859-1 done for l in cs_CZ hr_HR hu_HU la_LN pl_PL ro_RO sh_YU sk_SK sl_SI \ sq_AL \ cs hr hu la pl ro sh sk sl sq; do echo "$l CP852" # ISO-8859-2 done for l in tr_TR tr; do echo "$l CP857" # ISO-8859-9 done for l in is_IS is; do echo "$l CP861" # ISO-8859-10 done for l in he_IL he; do echo "$l CP862" # ISO-8859-8 done for l in ar_DZ ar_EG ar_IR ar_IQ ar_JO ar_KW ar_MA ar_OM ar_QA \ ar_SA ar_SY ar_AE ar; do echo "$l CP864" # ISO-8859-6 done for l in da_DK nb_NO nn_NO no_NO \ da nb nn no; do echo "$l CP865" # ISO-8859-1 done for l in be_BE bg_BG mk_MK sr_YU be bg mk sr; do echo "$l CP866" # ISO-8859-5 done for l in eo_GR eo; do echo "$l CP869" # ISO-8859-7 done for l in th_TH th; do echo "$l CP874" # TIS-620 done for l in ru_RU ru_SU ru; do echo "$l CP878" # KOI8-R done for l in ja_JP ja; do echo "$l CP932" # Shift-JIS done for l in zh_CN; do echo "$l CP936" # GBK/EUC-CN done for l in kr_KR kr; do echo "$l CP949" # EUC-KR done for l in zh_TW; do echo "$l CP950" # BIG5 done ;; The order is ascending codepage order. The above part of the shell script will create a file called charset.alias that will be installed in $DJDIR/lib and will look like this: # This file contains a table of character encoding aliases, # suitable for operating system 'msdosdjgpp'. # It was automatically generated from config.charset. # Packages using this file: gettext C CP437 US-ASCII CP437 en_US CP437 ca_ES CP850 de_AT CP850 de_CH CP850 de_DE CP850 en_AU CP850 en_CA CP850 en_GB CP850 en_ZA CP850 [SNIP] If the environment variable LANG is set to one of the values contained in the file the .mo file will be recoded to the corresponding dos codepage on the fly. Possible values are LL or LL_CC (LL=language code, CC= country code). Please note that GNU gettext-0.10.35 uses LANGUAGE instead. This has changed in gettext-0.10.36. Now LANG *must* be used to set the LL and LANGUAGE may be used to change this selection. Example: LANG=de LANGUAGE=es:de This will select the spanish translations, if not found the german ones will be used. If in the above example LANG is omitted gettext will default to "C", this means english messages will be printed using CP437 no matter if LANGUAGE is set or not. Please note, the shell script snippet is probably not complete. If you see that I have forgotten your language, please notify me and tell me the right codepage so it can be added to the list. Please also note that libiconv is *not* able to recode every existing codepage. This may change in the future. I have create the shell script snippet with the same info I have used to create recodepo.sh, this is: MS-DOS 6.22 COUNTRY.TXT file available from: and: There are two modifications needed in intl/localcharset.c. First, charset.alias will be opened in binary mode so the EOL issue matters. Second, locale_charset() calls setlocale() and this must be inhibit because DJGPP's setlocale() only supports 'C' and 'Posix'. As it can be seen above 'C' implies always CP437 and CP437 is useless for all languages exept US english. To this purpose I have added the following snippet to intl/localcharset.c: #if ((__DJGPP__ == 2) && (__DJGPP_MINOR__ <= 3)) /* DJGPP 2.03 and prior only supports C and POSIX. */ # undef HAVE_SETLOCALE # define HAVE_SETLOCALE 0 #endif This will inhibit the use of setlocale() and will allow the use of the environment variables LC_ALL, LC_CTYPE and LANG. To implement all this, the supplied patch will modify the following files: gettext-2001-02-05/intl/config.charset gettext-2001-02-05/intl/dcigettext.c gettext-2001-02-05/intl/gettextP.h gettext-2001-02-05/intl/libgnuintl.h gettext-2001-02-05/intl/localcharset.c gettext-2001-02-05/intl/localealias.c gettext-2001-02-05/lib/system.h gettext-2001-02-05/src/Makefile.am gettext-2001-02-05/src/message.c gettext-2001-02-05/src/msgcomm.c gettext-2001-02-05/src/msgfmt.c gettext-2001-02-05/src/msgunfmt.c gettext-2001-02-05/src/open-po.c gettext-2001-02-05/src/po-lex.c gettext-2001-02-05/src/xget-lex.c gettext-2001-02-05/src/xgettext.c gettext-2001-02-05/tests/Makefile.am It will also add the DJGPP specific files: gettext-2001-02-05/djgpp/config.bat gettext-2001-02-05/djgpp/config.sed gettext-2001-02-05/djgpp/config.site gettext-2001-02-05/djgpp/edtest.bat gettext-2001-02-05/djgpp/fnchange.lst gettext-2001-02-05/djgpp/readme gettext-2001-02-05/djgpp/tscript.sed All this files are needed to configure this package on WinDos. fnchange.lst contains a list of files that will be renamed on-the-fly by djtar during package unzipping/untaring. This will guarantee that the sources can be configured, compiled und checked on plain DOS. The files that do not resolve to a valid short file name are all from the testsuit. Also the files created by the testsuit scripts are not 8.3 clean. They create filenames with more than one dot. I do not think it is worth to change the testsuit to remove this name conflicts. The DJGPP user will configure this package running config.bat. config.bat will call edtest.bat and this batch file will "patch" on-the-fly the testsuit so that it will work on plain DOS. All this has already been deviced for gettext-0.10.35 and is reused here. It should be noticed that the file m4/gettext.m4 contains the following line: [case " $CONFIG_FILES " in *" po/Makefile.in "*) I am not really familiar with Autoconf stuff so I will *not* claim that the above line is a bug but it should be noticed that the pattern: *" po/Makefile.in "* will inhibit the use of the expression: po/Makefile:po/Makefile.in-in on all systems that depends on this possibility. Once again: Makefile.in.in is not a legal DOS name and **all** DJGPP ports, not only this one, depends on the possibility of renaming Makefile.in.in to Makefile.in-in. Anyway, config.bat will repair this for DJGPP. Because I do *not* know if the above line is a bug or not, the supplied patch will ***not*** modify m4/gettext.m4 at all. config.bat will also modify one rule from the src/Makefile. This rule is: po-gram-gen2.h: po-gram-gen.h $(SED) 's/[yY][yY]/po_gram_/g' $(srcdir)/po-gram-gen.h > $@-tmp mv $@-tmp $@ It can be seen that po-gram-gen2.h and po-gram-gen.h resolve to the same short file name. In conclusion: with the patch supplied here, gettext can be configured, compiled, tested, installed and deinstalled (to some extend) out of the box. on plain DOS (SFN) and Win9X (LFN). I have successfully reconfigured, recompiled, tested and installed all existing DJGPP that offers NLS. The recoding on-the-fly of the installed .mo files works ok. Neitherless it should be realized that a working libiconv.a must **always** be installed. The use of this gettext port together with libiconv breaks all existing configure scripts from the DJGPP ports. This configure scripts checks for gettext() in libintl.a, for bindtextdomain() and for dcgettext(). All this checks are performed by compiling a test programm that is linked with libintl.a. But this port of libintl.a has references to libiconv.a and this library is not linked to the binary at all. All this implies that the linkage phase of all test binaries fail. The reason is that the actualy existing gettext.m4 macro generates link commands like this: -lintl instead of this one: -lintl -liconv if libiconv.a has been found on the system. For all existing DJGPP ports I will provide modified config.sed files to fix this as soon as gettext-0.10.36 is officially released. Then I will create the DJGPP packages for the djgpp users that will contain this modified versions of config.sed and the changed conio.h and conio.o files. The patch with a small changelog follows in the next mail. I will submit separate patches for libc and libiconv-1.5.1. As usual, comments, suggestions, objections, questions are welcome. Regards, Guerrero, Juan Manuel