Date: Sat, 5 Nov 94 08:19:06 JST
From: Stephen Turnbull <turnbull AT shako DOT sk DOT tsukuba DOT ac DOT jp>
To: nummedal AT pvv DOT unit DOT no
Cc: djgpp AT sun DOT soe DOT clarkson DOT edu
Subject: NULL pointers...

   Dag Nummedal writes:
   The point was to get GCC to change its interneal represantation of the token
   0, used in a pointer context. That is:
	   int *p = 0
   would in assembly make p point to a high memory location, not to memory
   location zero.

   This way only badly broken C-code, that assumes that:
	   int i = 0
   can be cast to a null pointer, would fail.

I don't have access to a "modern" C-specific reference at the moment,
but according to Stroustrup "The C++ Programming Language" 1e (1986):

    A pointer may be explicitly converted to any of the integral types
    large enough to hold it. ... The mapping function is also machine
    dependent....  An object of integral type may be explicitly
    converted to a pointer.  The mapping always carries and integer
    converted from a pointer back to the same pointer, but is
    otherwise machine dependent.

Something like these rules has to be enforced, or systems programming
(specifically, memory-mapped device drivers) can't be done in C.  So I
guess that code which searches for a video buffer (eg) and returns an
integer value which is the absolute address of that buffer (hardware
addresses are *not* C pointers) must therefore return

    (int) (void *) 0

to indicate "not found".  (A pretty good trick if that code was
written in another language.)  Code checking for it does something
like

    switch (address) {
    ((int) (void *) 0):
                          /* didn't find a video buffer */
                break;    /* etc */

Otherwise it's "badly broken."  What you're saying then is that 0 is 0
except when it's the inverse of (void *) 0.  A programming language is
supposed to make life easier for humans, not for compilers and
automatic code verifiers.
    (It occurs to me that the semantics of a cast are not the same as
the semantics of initialization, and therefore the above code doesn't
necessarily work, either.  Maybe it could be made to do so, but if not
we need to define an otherwise unused variable to do this:

    void * const null_pointer = 0;  (int) null_pointer

gaaaakkk.  This may be unbroken code but it looks like a pretty broken
language to me.)
    I agree that for portability's sake, the above is necessary.  I
can imagine that there could be machines where it makes sense for some
reason to put NULL somewhere other than address 0.  But the
convenience of having 0 mean 0 in the context of the argument of the
conversion (type *) 0 seems overwhelming to me.  If you don't like
this, then define a new keyword (NULL would be a good candidate except
lots of code seems to use it to mean "false") which means what we
currently call a NULL pointer, and disallow the use of *any* integral
value to mean NULL pointer.
    I understand the logic of saying that in a program "0" is just a
token, and the internal representation can be whatever you want.  But
I think it's an unfair burden to place on everyday programming:  "is
this an integer or is it the special token used to represent null
pointers in initialization and comparison?  oh yeah, it is."

   The problem with this is that probably most of libc and go32 (at least the
   assembly parts) would assume that the null pointer points to memory location
   zero.

   If implemented, this could catch a lot of bad code, but it would also make
   porting programs to DJGPP harder.

I don't see why this catches bad code better than protecting page
zero.  Unless you're planning on protecting both, and logging the
names of any programmers whose code checks for (int) p == 0 where p is
a T*.

    --Steve