www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1994/10/31/03:08:41

Date: Mon, 31 Oct 94 13:59:04 JST
From: Stephen Turnbull <turnbull AT shako DOT sk DOT tsukuba DOT ac DOT jp>
To: dj AT stealth DOT ctron DOT com
Cc: tony AT nt DOT tuwien DOT ac DOT at, djgpp AT sun DOT soe DOT clarkson DOT edu
Subject: NULL pointers in (ANSI) string functions [was: strcat() ?]

   > [Anton.Helm] would like to know what 
   > 
   > strcat(a, b);
   > 
   > should do when b happens to be NULL.

DJ says:

   It *should* cause a code fault, stack trace, and immediate exit to
   DOS, since you're accessing memory that is off limits.  You won't get
   this under DPMI though because of the way DPMI memory is set up.

   Passing NULL to strcat is illegal.

I was through this once before, and passing cp to *any* string
function, where char *cp = 0, is illegal.  I'm not sure that it isn't
strictly speaking undefined behavior under ANSI standards, and that
implementations *are allowed* to check for NULL pointers and turn them
into pointers to `""'.  I assume this dangerous interaction of
the general rule that "a NULL pointer is allowed anywhere that a legal
non-NULL pointer of that type is" as far as the compiler is concerned,
and undefined behavior of the string functions when passed NULL
pointers is permitted for efficiency (it's hard to imagine a more
elegant implementation than

char *strcat (char *d, char* s)
{
  register char *td = d, *ts = s;
  while (*td) td++;
  while (*td++=*ts++);
  return d;
}

especially if provision is made for inlining).
    If ANSI makes it illegal (not undefined), then the following
discussion doesn't belong here (and I apologize), since GCC attempts
to provide the option of ANSI conformance.  If it's undefined then
theoretically DJ could provide more robust libraries, but this is
dangerous, as we'll see.
    The problem is that not everyone will agree, at least in some
cases, what should be done with NULL pointers.  For s = NULL, the
obvious (and inexpensive) answer is "do nothing".  I can't see a
problem with that, ever.  (I'm not very good at predicting bugs
though; anybody else see a problem with it?  See below for the closely
related consideration of associativity, however.)
    For d = NULL, all of the possibilities are a problem.  Returning
NULL is easy, but who wants that?  Returning s is easy, but since s is
declared const *, overwriting the return value (eg, with a later
strcat) would be seriously impolite and impossible to find in the
programmer's own source; he'd have to look at the implementation in
the library.  (Note that s is not necessarily known by the compiler to
be constant; it may be a variable whose value is unchanged for a
greater scope than that of d.)  Returning a pointer to a static buffer
(a) means that the return value of strcat will sometimes get
overwritten (yuck) and (b) can't necessarily hold the return value
since it's of fixed size (shades of the DOS command line!)  Returning
a string allocated on the heap is a serious memory leak, since you
don't know whether (in general) a return value from strcat will need
to be free'd or not, especially since lazy programmers will take great
advantage of initializing all otherwise uninitialized strings to NULL.
The argument d to strcat is required to be preallocated enough space
to contain the result, again for efficiency reasons, so assigning a
pointer to "" to d won't work.
    Having different rules for different arguments, even if obvious,
would still lead to problems.  Eg, string concatenation is associative
in value (but not in side effects, of course).  Why should the
programmer have to worry about the difference between

strcat(strcat(a,b),c)

and 

strcat(a,strcat(b,c))?

The former would be legal when a is non-NULL, the latter would be
legal only if a and b are non-NULL.  (Some people might consider this
to be the problem I wanted above.)  And what about assignments in the
argument expressions (which might later be used as string arguments),
or the various associations possible for

strcat(a,strcat(b,strcat(c,d)))?

In these cases the programmer must either ensure that all strings are
initialized or trap for NULL pointers in her own code.  And to give
you some idea of just how ugly it could get, suppose the programmer in
C++ did

class String {
private:
  char *s;
  String (char *initializer) { s = strdup(initializer); }
public:
  String operator+ (String source)
    { char *init = malloc(strlen(s)+strlen(source.s)+1);
      (void) strcat(init,source.s);
      String newString(init);
      free(init);
      return newString;
    }
  };

The C and C++ compilers do not define the order in which associative
operations take place; the implmentation is permitted to optimize.  Oops.
    I trust I've made my point.

+-----------------------------------------------------------------------+
|                           Stephen Turnbull                            |
|     University of Tsukuba, Institute of Socio-Economic Planning       |
|          Tennodai 1-chome 1--1, Tsukuba, Ibaraki 305 JAPAN            |
|        Phone:  +81 (298) 53-5091     Fax:  +81 (298) 55-3849          |
|               Email:  turnbull AT shako DOT sk DOT tsukuba DOT ac DOT jp                 |
|                                                                       |
|                Founder and CEO, Skinny Boy Associates                 |
|               Mechanism Design and Social Engineering                 |
| REAL solutions to REAL problems of REAL people in REAL time!  REALLY. |
|                      Phone:  +81 (298) 56-2703                        |
+-----------------------------------------------------------------------+

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019