From: bfishman AT nirvanah DOT corp DOT es DOT com (Barry Fishman) Subject: Re: ASCII and BINARY files. Why? 31 Jan 1997 00:46:11 -0800 Approved: cygnus DOT gnu-win32 AT cygnus DOT com Distribution: cygnus Message-ID: <199701310237.TAA19218.cygnus.gnu-win32@nirvanah.corp.es.com> Original-To: gnu-win32 AT cygnus DOT com Original-Cc: bfishman AT es DOT com (Barry Fishman) In-reply-to: Your message of "Thu, 30 Jan 1997 01:10:41 PST." <32F06591 DOT 2169 AT netcom DOT com> Original-Sender: owner-gnu-win32 AT cygnus DOT com Grant Leslie wrote: > However this step seems a touch superfluous if you are already > sitting at a PC, and far from user freindly. If you don't think so > try explaining to the "average" Win95 or WinNT user (ok the WinNT > user might be MUCH easier) why some of there software works fine, > but, if they use this other stuff we made for them that, they need to > go to a "DOS prompt" and type this command before they can use the > file AND have it look right if they intend to use the stuff you > wrote... > After all just because we all write this software, lets not forget > that it's the end user that has to really get the use out what we > make. First, I think targeting any of this work for the average Win95 or WinNT user is foolish. GNU/POSIX is an environment for software professionals, not end users. Its a means of making the GNU tools available on NT platforms for people who feel this environment gives them a productivty edge. End user don't see this any of this, unless they want to. They get the use out of what we make, not what we use to make it. Now back to the ASCII/BINARY discussion. I think we need to follow the principle of least astonishment. When one opens a file and sees ^M 's at the end of each line, you can tell right away what is going on. Even NOTEPAD's run together lines is obvious after the first time you see it. Running the file through a simple filter fixes it. One just needs 'totext' and 'tobinary' filters. What is difficult is having to spend hours patching each application to to get around unexpected problems with seek addresses and files that don't match their expected sizes. I think ANSI created the problem by having the binary/text decision made by the application, and not a property of the file. File offsets are now dependent on how the file was opened, and not a property of the file itself. This is asking for problems. On file systems where text mode is an attribute of the file (like VMS) one can at least detect the file is the wrong type, and let the open() fail with a readable error message. Pipes set up by the shell can be binary at both ends, since that is how C sees every file stream once it has been opened. Do NT file systems record this information? (I'm far from an NT expert) If they don't, the practical choice seems to be: a) Open everything as text. Patch bash and the important binary tools like tar and gzip to get around the problems. You would then need to maintain these tools over time, if the FSF does not want to accept testing their changes on WinNT/Win95 platforms. (As you can see, I don't like this approach.) a) Open everything as binary and let the user deal with simpler data conversion problems. b) Have the open() call guess the file type. I think that is what perl's -t test does. One might scan the first buffer full and see if all the bytes are reasonable ascii with a preceeding the first . The (b) approach would make the ASCII/BINARY mistakes rarer, which would probably exponentially increase the astonishment when it happens. Therefore, the user would still need to be able to fall back to the (a) approach by doing something like setting an environment variable. Barry Fishman - For help on using this list, send a message to "gnu-win32-request AT cygnus DOT com" with one line of text: "help".