Sender: rich AT phekda DOT freeserve DOT co DOT uk Message-ID: <3BC38161.C1FBC8F1@phekda.freeserve.co.uk> Date: Tue, 09 Oct 2001 23:59:45 +0100 From: Richard Dawe X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.19 i586) X-Accept-Language: de,fr MIME-Version: 1.0 To: djgpp-workers AT delorie DOT com Subject: Re: Resend: DJGPP and files > 2GB References: <200110092052 DOT WAA16564 AT father DOT ludd DOT luth DOT se> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Reply-To: djgpp-workers AT delorie DOT com Hello. Martin Str|mberg wrote: > > According to Richard Dawe: > > An example of the problem is the handling of the st_size member of the > > stat structure. st_size is an off_t. This means that the maximum size > > that can be represented is 2GB - 1. Currently stat will return a large > > negative number, since off_t is signed (which caused ls to output a > > bogus size). If st_size cannot represent the file's size, stat is > > supposed to return -1 and set errno to EOVERFLOW. > > > > I don't think the library's current behaviour is good - who knows what > > effect silently returning bogus sizes for files >= 2GB will have? > > As the possible values are -2^31 - 2^31-1 there no ambiguity (if > you're aware of that a negative value != -1 is really a big positive > one). Sure, there's no ambiguity for stat alone. But if you report the file size as > 2GB in stat, then you may not be able to manipulate some portions of the file. E.g. you may want to use relative seeks to get to the top 2GB of a file, but you can't, because off_t is a signed value and cannot be used to represent a seek to > +2GB of the current position. How would you interpret the seek to > +2GB, when you can't represent > +2GB, because negative values are used for backwards seeking? I think the idea is that the file's size should be representable as a positive number in off_t, so that's there no doubt about its interpretation, i.e. no casts are needed to use it as a file size. > But I'm not against any improvements as that result might confuse > people. Or lead to ugly hacks in the code, like the fix to ls? > How does the type offset_t fit into the ways of addressing the > problem (if you know)? (I presume you are mean off_t rather than offset_t.) I just read the LFS document in more detail. It appears each file descriptor has a offset maximum stored with it. Operations that would go beyond this offset maximum fail and set errno to EOVERFLOW. Storing the offset maximum in the file descriptor data allows the descriptor to be used with objects compiled with different off_t sizes. Offset maximum would be 2GB - 1 for us. There are several different ways that a program could be compiled: 1. _LARGEFILE_SOURCE defined, compiled & linked with LFS flags to use a large off_t; 2. _LARGEFILE_SOURCE defined, but compiled & linked with no LFS flags to use a normal off_t; 3. _LARGEFILE64_SOURCE defined, compiled & linked with transitional library. The LFS flags are returned from getconf. There a CFLAGS, LDFLAGS and LIBS variants, to pass on the compile & link lines. The example for case 1 from the LFS doc is: c89 $(getconf LFS_CFLAGS) a.c \ $(getconf LFS_LDFLAGS) \ $(getconf LFS_LIBS) and for case 2: c89 -D_LARGEFILE_SOURCE a.c Case 2 allows you to use the functions fseeko and ftello, which return EOVERFLOW, when the offset cannot be stored in an off_t. I don't think it's worth implementing the transitional API, so I won't explain 3. See the LFS document for more details. But this doesn't answer your question about off_t size. The off_t size can be controlled by defining _FILE_OFFSET_BITS to the number of bits in off_t. If it's not defined, then the default size should be used. If it's defined, but does not correspond to a supported size (e.g. 37 bits), then an error should be generated. The tricky part of supporting various sizes of off_t via _FILE_OFFSET_BITS is that #undef lseek, say, is not allowed to change the size of off_t that lseek takes. This means that you can't alias, say, lseek to lseek64 just using a #define. glibc appears to use some assembly magic to do the thunking. If the compiler doesn't support this method, then it falls back on...#defines. So it seems like this work can be split into stages: 1. Make the file descriptor functions aware of the offset maximum. Add support for EOVERFLOW (and other error messages) as listed in LFS/POSIX drafts. 2. Add support for larger off_t based on _FILE_OFFSET_BITS. 3. Add support for transitional API, if anyone cares. BTW autoconf has tests for the large file features, which fileutils uses. Bye, Rich =] -- Richard Dawe http://www.phekda.freeserve.co.uk/richdawe/