Date: Sun, 22 Dec 1996 17:45:23 +0200 (IST) From: Eli Zaretskii To: j DOT potman AT student DOT utwente DOT nl cc: djgpp AT delorie DOT com Subject: Re: program with profiling info runs slower with DJGPP 2.01 In-Reply-To: <58079.s9404198@student.utwente.nl> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII On Fri, 20 Dec 1996, Jordy Potman wrote: > I've tried to reproduce the problem in a smaller program but haven't > succeeded yet. Small programs with only one simple function that is called > repeatedly (my program calls the main rendering function repeatedly) > don't show the problem. I think the problem is caused by repeatedly calling > a lot of functions (in my main rendering function there are a lot of calls > to other functions). This is not very easy to reproduce in a smaller > program. One situation where this might happen is if the code fits into the CPU cache when compiled with v2.0, but is too large under v2.01, and so cache thrashing happens. I'll admit I've never seen this happen except in test cases specifically designed to exhibit the problem, but it might just be the case. > I have profiled my program compiled with v2.0 and v2.01. There are some > differences in the output of gprof. > Profile compiled with DJGPP v2.00: > Each sample counts as 0.055556 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 71.05 1.50 1.50 151 9.93 9.93 BitBlt > 10.53 1.72 0.22 151 1.47 1.47 ScanEdges > 5.26 1.83 0.11 94475 0.00 0.00 DotProduct > 5.26 1.94 0.11 151 0.74 13.24 UpdateWorld > 5.26 2.06 0.11 _int86 > 2.63 2.11 0.06 13139 0.00 0.01 ClipToPlane > 0.00 2.11 0.00 1 0.00 1999.82 main > > Profile compiled with DJGPP v2.01: > Each sample counts as 0.055556 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 33.33 0.11 0.11 18 6.17 6.17 BitBlt > 33.33 0.22 0.11 _int86 > 16.67 0.28 0.06 883 0.06 0.06 ClipToFrustum > 16.67 0.33 0.06 18 3.09 3.09 ScanEdges > 0.00 0.33 0.00 11394 0.00 0.00 DotProduct > 0.00 0.33 0.00 1592 0.00 0.00 ClipToPlane > 0.00 0.33 0.00 18 0.00 12.34 UpdateWorld > 0.00 0.33 0.00 1 0.00 222.20 main I'm a bit at a loss with these profiles. First, the v2.01 profile seems to indicate *faster* runtime, not slower: the ms/call figures are smaller (6.17 for BitBlt in v2.01 as opposed to 9.93 in v2.0; other functions are also consistently faster in v2.01). Second, you obviously run both versions for a *very* different elapsed times (a factor of 9 between 222ms for the main function in v2.01 as opposed to 2000ms in v2.0), so this is probably the cause of the different distribution of time in percents, since the timer granularity is too crude to correctly measure short runs. You should run your program for much longer (say, tens or hundreds of seconds) in order to obtain a meaningful profile. > I can't explain why _int86 takes such a large % of time in v2.01. I submit that this is an artefact: the time is 0.22 seconds, which is 1/9th of the 2.0 seconds in the v2.0 run, which was almost exactly 9 times longer in elapsed numbers (or do I miss something?). So the different percentage is most probably due to the fact that functions which take much less time weren't sampled at all in the much shorter v2.01 run. > The > profile also shows that the program is a lot slower in v2.01, take a look > at the number of calls to UpdateWorld (the main rendering function). I don't see what bothers you here. UpdateWorld is the only function which has almost the same time per call in both versions. Its number of calls (18 in v2.01 as opposed to 151 in v2.0) is consistent with the fact that the total runtime of the v2.01 version was 9 times shorter than that of the v2.0 version. To sum up: 1) The two versions were run for a short (and unequal) time which might make the profile inexact. Please run both versions for much longer times and post the profiles then. 2) If anything, the profile indicates that the v2.01 version is *faster*, not slower. Please explain why do you think that the v2.01 version is ``much slower''.