Mail Archives: djgpp/1999/09/30/18:56:21
On Thu, 30 Sep 1999 16:17:21 +0200, Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
wrote:
>
>On Thu, 30 Sep 1999, Scott Brown wrote:
>
>> Unfortunately, I find that opendir/readdir, when combined with a stat
>> call (to get file mode/size/etc) is a *lot* slower than findfirst,
>> taking on the order of 70-100 times longer to perform the same work.
>> When running against tens of thousands of files in hundreds of
>> directories, it is a significant problem.
>
>stat is expensive (it isn't easy to get all that info on DOS; you won't
>believe how closely DOS guards some of its dirty secrets ;-). But 100
>times slower seems to be too much; I suspect your system is not set up in
>an optimal way. See section 3.9 of the FAQ; in particular, make sure you
>have a disk cache installed.
Well, I'm running under Windows 95, which has a built-in disk cache,
and my hardware is plenty powerful for this kind of work (K6-200
w/64Mb). Statistics from sysmon are limited, but it shows that my
disk cache hasn't dropped below 2.5Mb in the last little while.
>A 10-fold slow-down when using stat is
>something I would expect, but not 100-fold.
It seemed pretty ridiculous to me as well. I put together a simple
timed test and ran it against a set of about 30,000 files in 350
directories; I ran each test twice and took the second result, to give
the OS a fair chance to load the cache.
The findfirst test finished in 1.98 seconds, while the readdir/stat
test finished in a whopping 133.96 seconds. I believe my test code is
fair, but second opinions are welcome; grab a copy here:
ftp://ftp.xmission.com/pub/users/s/skb/pub/dirtest.c
>Read the docs ;-).
That always helps...
>No, seriously: the documentation of _djstat_flags in libc.info describes
>several flags that can be set to disable computing some expensive
>members of struct stat for which you don't have any use. For the
>fastest operation, you should disable all features but those which your
>application needs. Doing so is known to speed up stat tremendously.
I only need the mode, and the size and timestamp for files. After I
R'd TFM, I tried setting *all* of the _STAT... bits, but the results
were disappointing; certainly not what I'd term a "trememdous"
improvement. Instead of taking 133.96 seconds to finish, the test
took only 125.44 seconds.
My test program is fairly representative of the kind of directory
traversal code I use in my applications. Could there be something
else that I am overlooking?
- Raw text -