From: Zach Heilig Subject: Re: O_TEXT/O_BINARY grief To: djgpp AT sun DOT soe DOT clarkson DOT edu Date: Tue, 21 Jun 1994 16:15:14 -0500 (CDT) Joe Smith says: > OTOH, can anyone give me a for-instance where treating all six of the > above characters as 'whitespace' leads to incorrect semantics? And I > don't think quoted strings that span source lines count, since there > are perfectly portable ways to do that without embedding control > characters in the source. In the case mentioned (cpp macros), so you > remove backslash-linefeed pairs and leave the carriage-return: the > compiler will see the carriage-return as just another 'whitespace', > the same as the tab(s) you might add at the beginning of the following > line. (all six of those characters are white-space, so there isn't any sementics to get screwed up) However, the order of the is normally: This is a line\^M^J This is supposed to be the rest of it. The character that would be affected is the '^M' (if you follow the letter of the standard, it's not), and the '^J' is still there to cause trouble (for OS's where only '^J' is a newline) > Is cpp *required* by the standard to remove backslash-linefeed pairs? > It might conceivably keep the newline as part of the macro body. If > you write macros that *depend* on one behavior or another (say, > splitting and identifier or operator), then you will justly suffer for > it. (The only things that can't be split are tri-graphs) from 5.1.1.2 (ISO C standard): 1: Physical source file characters are mapped to the source character set (including new-line characters for end-of-line characters). Tri-graphs are replaced with single character internal representations. (note that for any single-char-newline in text files OS's, ^M^J isn't replaced with ^J) 2: each instance of a new-line character and immediately preceding backslash character are deleted. (note that at this point, \^M^J is still \^M^J, for any compiler, since ^M^J isn't a 'C' new-line) etc... language is parsed/translated... etc... -- Zach Heilig (heilig AT cs DOT und DOT nodak DOT edu) == (heilig AT agassiz DOT cas DOT und DOT nodak DOT edu)