From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Reading directories, readdir/stat too slow
Date: Mon, 4 Oct 1999 10:23:27 +0200
Organization: NetVision Israel	
Lines: 83
Message-ID: <Pine.SUN.3.91.991004102256.23739A-100000@is>
References: <37f307e1 DOT 967161774 AT news DOT xmission DOT com> <Pine DOT SUN DOT 3 DOT 91 DOT 990930160802 DOT 21365J-100000 AT is> <37f3c4e1 DOT 1015552570 AT news DOT xmission DOT com>
NNTP-Posting-Host: is.elta.co.il
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Trace: news.netvision.net.il 939025544 558 199.203.121.2 (4 Oct 1999 08:25:44 GMT)
X-Complaints-To: abuse AT netvision DOT net DOT il
NNTP-Posting-Date: 4 Oct 1999 08:25:44 GMT
X-Sender: eliz AT is
In-Reply-To: <37f3c4e1.1015552570@news.xmission.com>
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com
X-Mailing-List: djgpp AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com
Precedence: bulk


On Thu, 30 Sep 1999, Scott Brown wrote:

> The findfirst test finished in 1.98 seconds, while the readdir/stat
> test finished in a whopping 133.96 seconds.

I tested this on two different but similar machines, both on DOS and
Windows 95, and I don't see such a large ratio.  The tests were run on
two similar disks with about 26,000 files in some 900 directories.
Here are my results:

		             DOS 5.0    Windows 95 4.00.950
	   method

      findfirst/findnext       4 sec          6 sec
      readdir/stat(default)  100 sec        400 sec
      readdir/stat(optimal)   53 sec        165 sec
      readdir/access          32 sec         75 sec
      readdir/stat(nlink)     23 sec         82 sec

The last two methods need a few words of explanation.  readdir/access
uses `access' instead of `stat'; this will only work if you need to
know whether the file is a directory or not.  readdir/stat(nlink) uses
an optimization of directory traversal, whereby after you see st_nlink-2
subdirectories, you *know* there are no more subdirectories in this
directory, so you can stop calling `stat' for the purpose of finding
subdirectories.

Since your application needs the size and time stamp of the files, the
last two methods won't do for you.

The above numbers are different from yours in one crucial aspect: the
difference between the default and optimal stat flags *is* significant
in my case.  I don't know why in your case it was negligible; if you
are using a FAT32 volume, it might be due to the difference in the
filesystem code (mine is FAT16).

Also note how better is performance on DOS relative to Windows.  (I
did make sure that all other applications were idle when I ran the
tests.)

In any case, the ratio of findfirst/findnext method to readdir/stat,
even for Windows, is not 1:100, but about 1:65 for default `stat', and
1:30 for the optimal `stat'.  A real-world application, which actually
does something with the files, in addition to just traversing the
directory tree, should show smaller ratios.

> I believe my test code is fair, but second opinions are welcome

Actually, it's about as unfair as a fair program can be ;-).

The findfirst/findnext method calls findfirst/findnext only once for
any given file, and then gets all its info from that call.  In
contrast, readdir/stat calls findfirst/findnext at least twice (once
in readdir and another time in stat).  In addition, findfirst (called
by stat for every file) is about 10 times slower than findnext--this
is the reason behind the 1:10 rule of thumb I was citing earlier.  And
if that's not enough, stat invokes lots of other system calls:
_truename and _fixpath, to name just two; it also calls time-related
functions to translate the file's time stamp into a Posix-compliant
time_t value (this includes computation of the time zone, use of DST
rules, etc.).

As I said: stat is expensive, especially when used massively.

> My test program is fairly representative of the kind of directory
> traversal code I use in my applications.

If you want to speed up you application and still leave it portable, I
suggest to try to restructure the way you traverse the files.  As
written, the test program uses a traversal method optimal for
findfirst/findnext, which will always loose badly with the Posix
functions (because the OS kernel doesn't cache the findfirst/findnext
entries like Unix does with stat).  Using a different traversal method
might make the program faster.  I cannot suggest any specific
algorithm without knowing what does the application do with the file
information it gathers.

One minor optimization you could use is to start the search from a
fully qualified directory name.  The test program uses ".", which
forces _fixpath to call DOS for the default drive and directory FOR
EACH FILE it processes.  This alone amounts to 5-10 seconds of run
time.