From: bfishman AT nirvanah DOT corp DOT es DOT com (Barry Fishman)
Subject: Re: ASCII and BINARY files. Why?
31 Jan 1997 00:46:11 -0800
Approved: cygnus DOT gnu-win32 AT cygnus DOT com
Distribution: cygnus
Message-ID: <199701310237.TAA19218.cygnus.gnu-win32@nirvanah.corp.es.com>
Original-To: gnu-win32 AT cygnus DOT com
Original-Cc: bfishman AT es DOT com (Barry Fishman)
In-reply-to: Your message of "Thu, 30 Jan 1997 01:10:41 PST."
             <32F06591 DOT 2169 AT netcom DOT com> 
Original-Sender: owner-gnu-win32 AT cygnus DOT com


Grant Leslie wrote:
> However this step seems a touch superfluous if you are already
> sitting at a PC, and far from user freindly. If you don't think so
> try explaining to the "average" Win95 or WinNT user (ok the WinNT
> user might be MUCH easier) why some of there software works fine,
> but, if they use this other stuff we made for them that, they need to
> go to a "DOS prompt" and type this command before they can use the
> file AND have it look right if they intend to use the stuff you
> wrote...
> After all just because we all write this software, lets not forget
> that it's the end user that has to really get the use out what we
> make.

First,  I think targeting any of this work for the average Win95 or
WinNT user is foolish.  GNU/POSIX is an environment for software
professionals, not end users.  Its a means of making the GNU tools
available on NT platforms for people who feel this environment gives them
a productivty edge.  End user don't see this any of this, unless they
want to.  They get the use out of what we make, not what we use to make it.

Now back to the ASCII/BINARY discussion.  I think we need to follow
the principle of least astonishment.  When one opens a file and
sees ^M 's at the end of each line, you can tell right away what is
going on.  Even NOTEPAD's run together lines is obvious after the first
time you see it.  Running the file through a simple filter fixes it.
One just needs 'totext' and 'tobinary' filters.

What is difficult is having to spend hours patching each application to
to get around unexpected problems with seek addresses and files that
don't match their expected sizes.

I think ANSI created the problem by having the binary/text decision made
by the application, and not a property of the file.  File offsets are
now dependent on how the file was opened, and not a property of the
file itself.  This is asking for problems.

On file systems where text mode is an attribute of the file (like VMS)
one can at least detect the file is the wrong type, and let the open()
fail with a readable error message.  Pipes set up by the shell can be
binary at both ends, since that is how C sees every file stream once
it has been opened.

Do NT file systems record this information? (I'm far from an NT expert)
If they don't, the practical choice seems to be:

a) Open everything as text.  Patch bash and the important binary
   tools like tar and gzip to get around the problems.  You would then
   need to maintain these tools over time, if the FSF does not want
   to accept testing their changes on WinNT/Win95 platforms. (As you
   can see, I don't like this approach.)

a) Open everything as binary and let the user deal with simpler data
   conversion problems.

b) Have the open() call guess the file type.  I think that is what
   perl's -t test does.  One might scan the first buffer full and see if
   all the bytes are reasonable ascii with a <cr> preceeding the first
   <nl>.

The (b) approach would make the ASCII/BINARY mistakes rarer, which would
probably exponentially increase the astonishment when it happens.
Therefore, the user would still need to be able to fall back to the
(a) approach by doing something like setting an environment variable.

Barry Fishman
<bfishman AT es DOT com>
-
For help on using this list, send a message to
"gnu-win32-request AT cygnus DOT com" with one line of text: "help".