From: Eli Zaretskii Newsgroups: comp.os.msdos.djgpp Subject: Re: Reading directories, readdir/stat too slow Date: Mon, 4 Oct 1999 10:23:27 +0200 Organization: NetVision Israel Lines: 83 Message-ID: References: <37f307e1 DOT 967161774 AT news DOT xmission DOT com> <37f3c4e1 DOT 1015552570 AT news DOT xmission DOT com> NNTP-Posting-Host: is.elta.co.il Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Trace: news.netvision.net.il 939025544 558 199.203.121.2 (4 Oct 1999 08:25:44 GMT) X-Complaints-To: abuse AT netvision DOT net DOT il NNTP-Posting-Date: 4 Oct 1999 08:25:44 GMT X-Sender: eliz AT is In-Reply-To: <37f3c4e1.1015552570@news.xmission.com> To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com X-Mailing-List: djgpp AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On Thu, 30 Sep 1999, Scott Brown wrote: > The findfirst test finished in 1.98 seconds, while the readdir/stat > test finished in a whopping 133.96 seconds. I tested this on two different but similar machines, both on DOS and Windows 95, and I don't see such a large ratio. The tests were run on two similar disks with about 26,000 files in some 900 directories. Here are my results: DOS 5.0 Windows 95 4.00.950 method findfirst/findnext 4 sec 6 sec readdir/stat(default) 100 sec 400 sec readdir/stat(optimal) 53 sec 165 sec readdir/access 32 sec 75 sec readdir/stat(nlink) 23 sec 82 sec The last two methods need a few words of explanation. readdir/access uses `access' instead of `stat'; this will only work if you need to know whether the file is a directory or not. readdir/stat(nlink) uses an optimization of directory traversal, whereby after you see st_nlink-2 subdirectories, you *know* there are no more subdirectories in this directory, so you can stop calling `stat' for the purpose of finding subdirectories. Since your application needs the size and time stamp of the files, the last two methods won't do for you. The above numbers are different from yours in one crucial aspect: the difference between the default and optimal stat flags *is* significant in my case. I don't know why in your case it was negligible; if you are using a FAT32 volume, it might be due to the difference in the filesystem code (mine is FAT16). Also note how better is performance on DOS relative to Windows. (I did make sure that all other applications were idle when I ran the tests.) In any case, the ratio of findfirst/findnext method to readdir/stat, even for Windows, is not 1:100, but about 1:65 for default `stat', and 1:30 for the optimal `stat'. A real-world application, which actually does something with the files, in addition to just traversing the directory tree, should show smaller ratios. > I believe my test code is fair, but second opinions are welcome Actually, it's about as unfair as a fair program can be ;-). The findfirst/findnext method calls findfirst/findnext only once for any given file, and then gets all its info from that call. In contrast, readdir/stat calls findfirst/findnext at least twice (once in readdir and another time in stat). In addition, findfirst (called by stat for every file) is about 10 times slower than findnext--this is the reason behind the 1:10 rule of thumb I was citing earlier. And if that's not enough, stat invokes lots of other system calls: _truename and _fixpath, to name just two; it also calls time-related functions to translate the file's time stamp into a Posix-compliant time_t value (this includes computation of the time zone, use of DST rules, etc.). As I said: stat is expensive, especially when used massively. > My test program is fairly representative of the kind of directory > traversal code I use in my applications. If you want to speed up you application and still leave it portable, I suggest to try to restructure the way you traverse the files. As written, the test program uses a traversal method optimal for findfirst/findnext, which will always loose badly with the Posix functions (because the OS kernel doesn't cache the findfirst/findnext entries like Unix does with stat). Using a different traversal method might make the program faster. I cannot suggest any specific algorithm without knowing what does the application do with the file information it gathers. One minor optimization you could use is to start the search from a fully qualified directory name. The test program uses ".", which forces _fixpath to call DOS for the default drive and directory FOR EACH FILE it processes. This alone amounts to 5-10 seconds of run time.