www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/1998/02/05/08:17:27

Date: Thu, 5 Feb 1998 14:15:05 +0100 (MET)
From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
To: DJ Delorie <dj AT delorie DOT com>
cc: djgpp workers list <djgpp-workers AT delorie DOT com>
Subject: Re: char != unsigned char... sometimes, sigh
In-Reply-To: <199802050213.VAA14414@delorie.com>
Message-ID: <Pine.LNX.3.93.980205133818.29202A-100000@acp3bf>
MIME-Version: 1.0

Hello, everyone.

I've been watching this thread, but so far I've kept my mouth shut.  But
when the suggestion came up that our current implementation were not
ANSI-compliant, I decided to jump in. I have P.J.Plauger's wonderful book
'The Standard C Library' at home, so I looked up what it had to say about
the <ctype.h> functions and macros. I'll comment based on that below.

> > This thread was born out of a concern that our ctype functions don't 
> > support EOF.  ANSI C requires this support.  Knowing that funny things 
> > will happen in this case doesn't seem to help a bit when we face the sad 
> > conclusion that our libc is not fully compliant with the ANSI C standard.

I first objected to this, but on looking it up, there certainly is a
problem with EOF here. If the passed 'c' is of type int, and its value is
-1, then the result has to be different from the one you get by passing it
255.

But there's *no* real problem with signed/unsigned chars, I think, as
Plauger clearly states that to use the is*() functions on a signed char,
you definitely *have* to cast to unsigned char first. I.e., in a setup
with signed chars, the only correct use of the ctype functions on char
would look like isalnum((unsigned char) c). So if anyone passes a signed
char to any of these functions, the errors has already happened, and we're
not obliged in any way to protect the user from the resulting harm. 

> OK, then, how do we fix it?  Is there ever a case where the is*()
> functions/macros *care* if it's EOF or 0xff?  

In the 'C' locale: probably no. But who knows: maybe there is some locale
that has actually quenched a printable character into this position? Well,
on looking it up: there is! According to a Linux man page, 0xff is 'LATIN
SMALL LETTER Y WITH DIAERESIS' in ISO-Latin-1. So in that locale,
'isgraph(0xff)' should return 1, and our implementation can't do that :-(

> The only ones I know of are tolower/toupper, which return 0 for EOF
> (funny, toupper/tolower return *unsigned* char!). 

Not really. They return an *int*, which in turn is the result of casting
an unsigned char to int. At least, that's what they *should* do. 

> If we change that to return 0xff for EOF,
> then it won't matter if EOF==0xff, and we can just mask the value
> we're given with 0xff and be done with it (not even add 1).

I think the proper way is to just do away with all that '& 0xff' stuff in
our macros.  According to Plauger (who was a member of the ANSI C comittee
X3J11, so he should know :-), and the standard, any call of a <ctype.h>
function with an argument that's neither in the range of unsigned char,
nor EOF, causes undefined behaviour. So we're fully allowed to just return
rubbish in such cases, or SegFault, or whatever. Same goes for anyone
passing a signed char without casting it to unsigned. 

Plauger also has a rather nice trick to avoid '+1' operations in the
macros:  he generates an array of 257 entries, and makes his equivalent of
the 'unsigned short * __dj_ctype_flags' (note he uses a *, not an []) 
point to the [1] element of that array. That way, he can properly handle
both EOF and regular arguments like this (cited from memory):

#define isalnum(c) (__dj_ctype_flags[((int) c)] & __dj_ISALNUM)

Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de)
Even if all the snow were burnt, ashes would remain.


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019