Mail Archives: djgpp/1996/03/19/01:29:43
I am the author of a disk cataloging program. One of its options is to
extract header information from archive and graphics files. When I converted
to v2, I found that these options ran much more slowly. I now understand
why, and I know how to fix it:
1. When you do file i/o in v2, all bytes pass through a section of memory in
the first MB called the transfer buffer. To minimize the number of
switches to real mode, the library reads ahead to fill this buffer.
Normally this is a good idea, but for random access of small amounts of
information from a floppy, it is slow because you read info you don't
need. Using stubedit to change the size of the transfer buffer, I
recorded the following times to index a full 1.2MB disk containing 11 Jpeg
images:
buffer time (sec)
1K 12
2K 12
4K 13
8K 26
16K 41
32K 69
Turbo C 5.2
djgpp 1.2 5.9
Solution: use setvbuf to select a 2K buffer rather than accepting the
default. Stubedit for 32K xfer buffer for efficient HD I/O.
2. The above change still left the indexing time roughly twice as long as
with Turbo C or djgpp 1.2. The cause is inefficiency in fseek. Any fseek
call seems to discard buffered information and generate a DOS seek call
even if the seek could be accomplished by moving a pointer into the
buffered file data.
Solution: a routine my_fseek which seeks by moving pointers if possible,
and otherwise calls the library fseek. (Turbo C timing above already used
this trick.) Result: time reduced to 3.3 seconds, 20 times faster than
the 69 second timing which started this exercise. One question remains:
why was v1.2 not as slow as v2 with a small xfer buffer? It seems to use
the same algorithm.
My_fseek is only used for reading, so I didn't have to worry about some of
the cases which the library fseek handles. Would anything break if the
library fseek handled short seeks, at least for reads, by moving a pointer if
the seek destination were already in the buffer?
- Raw text -