www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2001/03/16/04:48:11

Date: Fri, 16 Mar 2001 11:46:32 +0200
From: "Eli Zaretskii" <eliz AT is DOT elta DOT co DOT il>
Sender: halo1 AT zahav DOT net DOT il
To: djgpp-workers AT delorie DOT com
Message-Id: <7263-Fri16Mar2001114631+0200-eliz@is.elta.co.il>
X-Mailer: Emacs 20.6 (via feedmail 8.3.emacs20_6 I) and Blat ver 1.8.6
CC: ST001906 AT HRZ1 DOT HRZ DOT TU-Darmstadt DOT De, recode-bugs AT IRO DOT UMontreal DOT CA,
djgpp-workers AT delorie DOT com
In-reply-to: <15025.20732.410053.828022@honolulu.ilog.fr> (message from Bruno
Haible on Fri, 16 Mar 2001 00:32:12 +0100 (CET))
Subject: Re: OS/DJGPP specific difficulties with recode 3.6
References: <4B62C66334B AT HRZ1 DOT hrz DOT tu-darmstadt DOT de> <15025 DOT 20732 DOT 410053 DOT 828022 AT honolulu DOT ilog DOT fr>
Reply-To: djgpp-workers AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp-workers AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

> From: Bruno Haible <haible AT ilog DOT fr>
> Date: Fri, 16 Mar 2001 00:32:12 +0100 (CET)
>
> > I will only show the output that diff produces for the first test:
> > 7. ./dumps.m4:3         --- -	Thu Mar 15 15:38:49 2001
> > +++ stdout	Thu Mar 15 15:38:48 2001
> > @@ -1,21 +1,23 @@
> > - 10
> > - 97,  10
> > - 97,  98,  10
> > - 97,  98,  99,  10
> > - 97,  98,  99, 100,  10
> > - 97,  98,  99, 100, 101, 102, 103, 104, 105,  10
> > + 13,  10
> > + 97,  13,  10
> > + 97,  98,  13,  10
> > + 97,  98,  99,  13,  10
> > + 97,  98,  99, 100,  13,  10
> > + 97,  98,  99, 100, 101, 102, 103, 104, 105,  13,  10
> 
> CR/LF. The tests apparently expect a Unix compatible 'echo' command.

No, it expects `echo' to produce Unix-style LF-only EOLs.  The test
suite _does_ use a Unix compatible `echo', which comes from ported GNU
Sh-utils.

When I worked on recode 3.4 and 3.5, I asked Francois why doesn't the
test suite specify the surface explicitly, as in "foo..bar/".  This
would allow the EOL format of generated files to be predictable.
Also, the test suite should IMHO not assume any specific EOL-related
behavior from programs besides recode it invokes.  In many cases,
using recode (with a trivial conversion spec) instead of echo is a
much better alternative, since it allows an explicit control of the
EOL format in produced files.  IMHO, this way we could eliminate many
of the horrible hacks that need to be added to the distribution to
make the test suite work on non-Posix platforms, and as a bonus, the
test suite will suffer from much less bit-rot than what we see now.

I don't think Francois had time to reply to those suggestions, but
perhaps they can be considered now.

> The assumption that all non-Microsoft-OS users are in a Latin1 locale
> is broken. The assumption that all DOS users use the IBM-PC = CP437
> character set is broken as well. You made a list of all character
> encodings used in DOS for config.charset, a few weeks ago, didn't you?

Nevertheless, it is IMHO important to have a reasonable default.  If
the charset is not specified by the user or the environment, I suggest
the following fallback procedure:

  - try to estimate the codepage from the country code (the latter is
    returned by a special system call);
  - if that fails, look at DEFAULT_CHARSET;
  - if that fails as well, use cp437 as the last resort.

It's true that cp437 is not a universal default, but in the absence of
the other two fallbacks, it's good enough, because that's how a
bare-bones DOS system with an empty CONFIG.SYS behaves.

> Would you mind changing in your port
> 
> 	  name = "char"; /* locale dependent */
> 
> into
> 
> <usual prologue for getting O_BINARY defined>
> #if O_BINARY
> 	  name = "char/crlf"; /* locale dependent but with CR-LF surface */
> #else
> 	  name = "char"; /* locale dependent */
> #endif

I think this is worse than what I suggest above.  I'm not even sure it
would be better than blindly assuming cp437 as the last resort, but
perhaps I'm wrong.  In any case, I think the possibility to estimate
the codepage from the country code should not be ignored.

> > it will evaluate the environment variable DEFAULT_CHARSET for
> > getting the appropiate charset. This character set always implies
> > the used surface. Of course, the average MSDOS/DJGPP user will never
> > set this value at all.
> 
> Which is exactly why we went through the config.charset horror. Once
> for all applications, including recode.

If setting DEFAULT_CHARSET is important, we could arrange for it to be
set in DJGPP.ENV, or even by the library startup code.  Alternatively,
recode or libiconv could include a (DJGPP-specific) static constructor
which pushes the correct DEFAULT_CHARSET into the program's
environment.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019