www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1996/11/01/20:24:00

Message-ID: <327AC62D.59A6@cs.com>
Date: Fri, 01 Nov 1996 19:55:25 -0800
From: "John M. Aldrich" <fighteer AT cs DOT com>
Reply-To: fighteer AT cs DOT com
Organization: Three pounds of chaos and a pinch of salt
MIME-Version: 1.0
To: Indrek Mandre <indrek AT warp DOT edu DOT ee>
CC: djgpp AT delorie DOT com
Subject: Re: Problems with fread/fwrite
References: <Pine DOT LNX DOT 3 DOT 91 DOT 961101194440 DOT 220A-100000 AT warp DOT edu DOT ee>

Indrek Mandre wrote:
> 
> Yes, I'v never programmed on DOS, but I have correct assumptions.

If you have never programmed on DOS, then how do you know your
assumptions are correct?  Would you make similar assumptions if you had
to program for a Macintosh?  The ANSI spec says nothing about how file
storage is performed on any given computer system, only that it must
appear the same way in the program itself.  When you load a text file in
DOS, the code automatically transforms the CRLF pair into a single CR so
it _looks_ like a Unix file.

Kludgy?  Blame DOS, not DJGPP.

> Well, I have only 1 thing to say and think: DOS _is_ stupid.
> I thougt that djgpp is like gcc. But now I know - djgpp is not gcc.
> Djgpp is stupid????

Excuse me.  You seem awfully fond of the word "stupid."  DOS is a
kludgy, badly designed operating system, agreed, but it is used by the
majority of personal computers in the world.  DJGPP is a _direct_ port
of GNU, and it is _fully_ ANSI and POSIX compliant.  Every effort has
been made to properly handle the different styles of file organization
in a DOS system, but even that can't help you if you don't get text and
binary modes right.

YOU SHOULD NEVER MIX TEXT AND BINARY FILE OPERATIONS IN ANY PROGRAM,
REGARDLESS OF THE OPERATING SYSTEM YOU ARE ON!!  This is specified by
every C textbook I have ever seen.

> 
> > Ex. 2
> > The reason this problem is being triggered for you is because you are never
> > initializing any of the variables you declare.  The ANSI spec makes no
> > guarantee that malloc'ed memory is zeroed, nor non-static local variables,
> 
> Wrong! It's only because of the fopen/fread.

Only in the sense that you are mixing text and binary modes.  fread is
not designed to function in text mode.

> I'm not using ANSI C. I think djgpp is _not_ ansi c. I'v reed only one
> book - Kerninghan-Ritchie's "The C programming language". My programming
> is based on that. There was no need to zero variables.

DJGPP is 100% ANSI C compliant.  K&R is an excellent reference book, but
the K&R standard is obsolete.  I am certain, however, that K&R tells you
that global and static variables are automatically initialized to zeroes
when they are first created.  Automatic variables and malloc'ed memory
are not initialized to anything.  You must handle any necessary
initialization or they will contain garbage.  Alternatively, you can use
calloc() to get pre-zeroed memory.  The old version of DJGPP (v1.x) used
to automatically zero all malloc'ed memory, but this was determined to
be too slow for many programs that frequently used malloc().  When v2
came out, this change broke many programs that didn't initialize
properly.  The error was theirs, not DJGPP's.

> 
> > Because you never initialize them, p, t, a, and b all contain garbage.  Some
> > of this garbage likely has CR and EOF characters in it, which are passed
> > verbatim in binary mode, but are handled completely differently in text mode.
> > If EOF is present in any of that garbage, the text mode commands will think
> > that this denotes the end of the file, and stop reading at that point.
> 
> The idea was not to set any other "aaaaaaaaaaaa..." in these variables.
> I'm not so stupid. What are these text mode commands?

I refer to the different behavior of the i/o commands depending on
whether the file they are working with is open in text or binary modes. 
In binary mode, all data is passed verbatim, with no modifications, and
the end of file condition is determined by DOS.  In text mode, however,
several conditions apply:

1) All CR characters in your data are converted to the appropriate style
for the operating system; in Unix, it's still just a CR; in DOS it's a
CRLF combination.

2) Input data is converted from the operating system style to ANSI C
style.  Specifically, under DOS, any CRLF characters read in are
converted to just CR.

3) The character EOF denotes the end of the file, and terminates all
input commands.  You should always check for feof() before using any
file reading function in text mode.

> > int main( void )                      /* THIS IS THE CORRECT WAY TO DECLARE MAIN */
> 
> No, void main ( void ) is! I'm right. You are right. In gcc there is no
> difference, or is there? Explain me please. Nowhere is said what is correct.

Again, the ANSI spec _specifically_ states only two correct ways to
declare main():

1) int main( int, char** )
    This format is used if you need to parse command-line arguments.

2) int main( void )
    Use this if you don't need to parse command-line arguments.

The int that main returns is necessary because it forms the program's
exit code to the operating system.  If you declare main as void, or fail
to return a value, the exit code is likely to be garbage, which can
seriously mess up any programs which depend on this code.  Examples of
such programs:  DOS batch files, UNIX shell scripts, Make, or any other
program which invokes yours with system() or spawn*().

Some compilers (DJGPP included) define an extension to the ANSI
specification which allows a third char** argument to main() which
contains the system environment, parsed in a manner similar to the
command-line arguments.  This is not standard, and thus should not be
counted on.  However, it is fundamentally incorrect to declare main() as
void.

> > {
> > static struct proov proov_zero;       /* this will be initialized to zero because it's static */
> > struct proov *p, *t;
> > FILE *fl;
> >
> >  p = malloc ( sizeof ( struct proov ) );    /* Let's allocate memory */
> >  t = malloc ( sizeof ( struct proov ) );
> >
> >  *p = proov_zero;     /* copy zeroed structure */
> >  *t = proov_zero;
> 
> Here I describe my understanding of stru. variables: I think * structures
> are like memory areas. Only I have declared to C that they consist of
> elements - int, short, char, float[256], ...  - 4, 2, 1, ... bytes in
> specific order. You copy the structures - why to waste memory? Better:
>         memset ( p, 0, sizeof (struct proov ) );
> If you don't like memset, use bzero. Do I have to say to memset that
> it must work in binary mode? I guess not!

No, memset also works.  I was trying to give an example that is not
dependent on direct memory access, but rather allows the compiler to do
things its own way.  I think that if you compile with optimizations (-O
switch), you will find that my way is nearly as fast as the memset()
technique.  Anyway, why should speed matter unless you call the function
millions of times; the initialization only occurs once.

As for wasting memory, in a flat-model 4 GB address space, you're hardly
going to miss an extra 300 bytes or so.  Use whatever method you prefer;
I was merely trying to be concise.  :)

> >  char a[22000] = "";          /* define and initialize to all zeroes */
> >  char b[2000] = "";
> 
> All that zeroing is never needed. In my programs I use aaa[5000000] -
> every time to zero it?, why to waste time...

If you open files in binary mode, you don't need to initialize.  But if
you open them in text mode, you'd better unless you want to get buggy
results.  As they say:  "Garbage in, garbage out..."

> Well. Is out there anyone who is using fread/fwrite in not bin mode.
> I think nobody. So why is fread so stupid? Is it possible to make
> fread work in any mode? Another question: does gets/scanf/??? work
> in bin mode?

fread/fwrite are simply not designed to be used in text mode.  Why? 
Because they manipulate data directly to and from memory, and don't
convert things in such a way as to make them readable with a text file. 
gets() and fscanf() are similarly not designed to work in binary mode,
as they work with formatted text.  It's like mixing apples and oranges;
it's much simpler to use the right function for the right task instead
of complaining about it.

The next time you feel masochistic, why don't you try using gets() to
read from a binary file?  If you manage not to crash your program, I'll
be very impressed.

> Is there any other stupid problems I may rendezvous?

Lots, if you don't use things the way they are supposed to be used.  "To
everything there is a season/And a time to every purpose under heaven." 
This is a sentiment I think you would do well to learn.

-- 
John M. Aldrich, aka Fighteer I <fighteer AT cs DOT com>

-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS d- s+:- a-->? c++>$ U@>++$ p>+ L>++ E>+ W++ N++ o+ K? w(---) O-
M-- V? PS+ PE Y+ PGP- t+(-) 5- X- R+ tv+() b+++ DI++ D++ G e(*)>++++
h!() !r !y+()
------END GEEK CODE BLOCK------

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019