www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/1998/02/06/03:22:04

Sender: vheyndri AT rug DOT ac DOT be
Message-Id: <34DAC80D.1EC@rug.ac.be>
Date: Fri, 06 Feb 1998 09:21:33 +0100
From: Vik Heyndrickx <Vik DOT Heyndrickx AT rug DOT ac DOT be>
Mime-Version: 1.0
To: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
Cc: DJ Delorie <dj AT delorie DOT com>,
djgpp workers list <djgpp-workers AT delorie DOT com>
Subject: Re: char != unsigned char... sometimes, sigh
References: <Pine DOT LNX DOT 3 DOT 93 DOT 980205133818 DOT 29202A-100000 AT acp3bf>

Hans-Bernhard Broeker wrote:
> I've been watching this thread, but so far I've kept my mouth shut.  But
> when the suggestion came up that our current implementation were not
> ANSI-compliant, I decided to jump in. I have P.J.Plauger's wonderful book
> 'The Standard C Library' at home, so I looked up what it had to say about
> the <ctype.h> functions and macros. I'll comment based on that below.
> 
> > > This thread was born out of a concern that our ctype functions don't
> > > support EOF.  ANSI C requires this support.  Knowing that funny things
> > > will happen in this case doesn't seem to help a bit when we face the sad
> > > conclusion that our libc is not fully compliant with the ANSI C standard.

This thread was re-born (renaissance?). I originally started this thread
out of the concern that many users don't understand that (char)'\x84' is
less than 0, and I requested to modify the default of "char". Only later
I ran into these buggy is* macros, which don't work well primarily
because of the current default of "char". It may seem that this last
reason was the only reason to change it, but it certainly is not.
 
> I first objected to this, but on looking it up, there certainly is a
> problem with EOF here. If the passed 'c' is of type int, and its value is
> -1, then the result has to be different from the one you get by passing it
> 255.

Not necessarily, but it should allowed to be different.

> But there's *no* real problem with signed/unsigned chars, I think, as
> Plauger clearly states that to use the is*() functions on a signed char,
> you definitely *have* to cast to unsigned char first. I.e., in a setup
> with signed chars, the only correct use of the ctype functions on char
> would look like isalnum((unsigned char) c). So if anyone passes a signed
> char to any of these functions, the errors has already happened, and we're
> not obliged in any way to protect the user from the resulting harm.

I agree, but the potential problems that may arise are much larger in
number with "signed char" being the default.

> > OK, then, how do we fix it?  Is there ever a case where the is*()
> > functions/macros *care* if it's EOF or 0xff?
> 
> In the 'C' locale: probably no. But who knows: maybe there is some locale
> that has actually quenched a printable character into this position? Well,
> on looking it up: there is! According to a Linux man page, 0xff is 'LATIN
> SMALL LETTER Y WITH DIAERESIS' in ISO-Latin-1. So in that locale,
> 'isgraph(0xff)' should return 1, and our implementation can't do that :-(

:-)

> > The only ones I know of are tolower/toupper, which return 0 for EOF
> > (funny, toupper/tolower return *unsigned* char!).
> 
> Not really. They return an *int*, which in turn is the result of casting
> an unsigned char to int. At least, that's what they *should* do.

They don't work for EOF.

> > If we change that to return 0xff for EOF,
> > then it won't matter if EOF==0xff, and we can just mask the value
> > we're given with 0xff and be done with it (not even add 1).
> 
> I think the proper way is to just do away with all that '& 0xff' stuff in
> our macros.  According to Plauger (who was a member of the ANSI C comittee
> X3J11, so he should know :-), and the standard, any call of a <ctype.h>
> function with an argument that's neither in the range of unsigned char,
> nor EOF, causes undefined behaviour. So we're fully allowed to just return
> rubbish in such cases, or SegFault, or whatever. Same goes for anyone
> passing a signed char without casting it to unsigned.

:-)

> Plauger also has a rather nice trick to avoid '+1' operations in the
> macros:  he generates an array of 257 entries, and makes his equivalent of
> the 'unsigned short * __dj_ctype_flags' (note he uses a *, not an [])
> point to the [1] element of that array. That way, he can properly handle
> both EOF and regular arguments like this (cited from memory):
> 
> #define isalnum(c) (__dj_ctype_flags[((int) c)] & __dj_ISALNUM)

Doesn't really matter as the compiler will produce the same object code,
which will add 1*sizeof(short unsigned) anyway when the pointer is
declared const, and INFERIOR code if it is not declared const.

-- 
 \ Vik /-_-_-_-_-_-_/   
  \___/ Heyndrickx /          
   \ /-_-_-_-_-_-_/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019