www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1994/06/21/18:02:11

From: Zach Heilig <heilig AT cs DOT und DOT nodak DOT edu>
Subject: Re: O_TEXT/O_BINARY grief
To: djgpp AT sun DOT soe DOT clarkson DOT edu
Date: Tue, 21 Jun 1994 16:15:14 -0500 (CDT)

Joe Smith says:
> OTOH, can anyone give me a for-instance where treating all six of the
> above characters as 'whitespace' leads to incorrect semantics?  And I
> don't think quoted strings that span source lines count, since there
> are perfectly portable ways to do that without embedding control
> characters in the source.  In the case mentioned (cpp macros), so you
> remove backslash-linefeed pairs and leave the carriage-return: the
> compiler will see the carriage-return as just another 'whitespace',
> the same as the tab(s) you might add at the beginning of the following
> line.

(all six of those characters are white-space, so there isn't any
sementics to get screwed up)

However, the order of the <cr><lf> is normally:

This is a line\^M^J
This is supposed to be the rest of it.

The character that would be affected is the '^M' (if you follow the
letter of the standard, it's not), and the '^J' is still there to
cause trouble (for OS's where only '^J' is a newline)

> Is cpp *required* by the standard to remove backslash-linefeed pairs?
> It might conceivably keep the newline as part of the macro body.  If
> you write macros that *depend* on one behavior or another (say,
> splitting and identifier or operator), then you will justly suffer for
> it.

(The only things that can't be split are tri-graphs)

from 5.1.1.2 (ISO C standard):

1: Physical source file characters are mapped to the source character
set (including new-line characters for end-of-line characters).
Tri-graphs are replaced with single character internal
representations.
(note that for any single-char-newline in text files OS's, ^M^J isn't
replaced with ^J)

2: each instance of a new-line character and immediately preceding
backslash character are deleted.
(note that at this point, \^M^J is still \^M^J, for any compiler, since
^M^J isn't a 'C' new-line)

etc...
language is parsed/translated...
etc...

-- 
	Zach Heilig	(heilig AT cs DOT und DOT nodak DOT edu) ==
			(heilig AT agassiz DOT cas DOT und DOT nodak DOT edu)

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019