www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1996/12/22/11:01:28

Date: Sun, 22 Dec 1996 17:45:23 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
To: j DOT potman AT student DOT utwente DOT nl
cc: djgpp AT delorie DOT com
Subject: Re: program with profiling info runs slower with DJGPP 2.01
In-Reply-To: <58079.s9404198@student.utwente.nl>
Message-ID: <Pine.SUN.3.91.961222172358.973D-100000@is>
MIME-Version: 1.0

On Fri, 20 Dec 1996, Jordy Potman wrote:

> I've tried to reproduce the problem in a smaller program but haven't 
> succeeded yet. Small programs with only one simple function that is called 
> repeatedly (my program calls the main rendering function repeatedly) 
> don't show the problem. I think the problem is caused by repeatedly calling 
> a lot of functions (in my main rendering function there are a lot of calls 
> to other functions). This is not very easy to reproduce in a smaller 
> program.

One situation where this might happen is if the code fits into the CPU 
cache when compiled with v2.0, but is too large under v2.01, and so cache 
thrashing happens.  I'll admit I've never seen this happen except in test 
cases specifically designed to exhibit the problem, but it might just be 
the case.

> I have profiled my program compiled with v2.0 and v2.01. There are some 
> differences in the output of gprof.
> Profile compiled with DJGPP v2.00:
> Each sample counts as 0.055556 seconds.
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  ms/call  ms/call  name    
>  71.05      1.50     1.50      151     9.93     9.93  BitBlt
>  10.53      1.72     0.22      151     1.47     1.47  ScanEdges
>   5.26      1.83     0.11    94475     0.00     0.00  DotProduct
>   5.26      1.94     0.11      151     0.74    13.24  UpdateWorld
>   5.26      2.06     0.11                             _int86
>   2.63      2.11     0.06    13139     0.00     0.01  ClipToPlane
>   0.00      2.11     0.00        1     0.00  1999.82  main
> 
> Profile compiled with DJGPP v2.01:
> Each sample counts as 0.055556 seconds.
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  ms/call  ms/call  name    
>  33.33      0.11     0.11       18     6.17     6.17  BitBlt
>  33.33      0.22     0.11                             _int86
>  16.67      0.28     0.06      883     0.06     0.06  ClipToFrustum
>  16.67      0.33     0.06       18     3.09     3.09  ScanEdges
>   0.00      0.33     0.00    11394     0.00     0.00  DotProduct
>   0.00      0.33     0.00     1592     0.00     0.00  ClipToPlane
>   0.00      0.33     0.00       18     0.00    12.34  UpdateWorld
>   0.00      0.33     0.00        1     0.00   222.20  main

I'm a bit at a loss with these profiles.  First, the v2.01 profile seems
to indicate *faster* runtime, not slower: the ms/call figures are smaller
(6.17 for BitBlt in v2.01 as opposed to 9.93 in v2.0; other functions are
also consistently faster in v2.01).  Second, you obviously run both
versions for a *very* different elapsed times (a factor of 9 between 222ms
for the main function in v2.01 as opposed to 2000ms in v2.0), so this is
probably the cause of the different distribution of time in percents,
since the timer granularity is too crude to correctly measure short runs. 
You should run your program for much longer (say, tens or hundreds of
seconds) in order to obtain a meaningful profile. 

> I can't explain why _int86 takes such a large % of time in v2.01.

I submit that this is an artefact: the time is 0.22 seconds, which is
1/9th of the 2.0 seconds in the v2.0 run, which was almost exactly 9 times
longer in elapsed numbers (or do I miss something?).  So the different
percentage is most probably due to the fact that functions which take much
less time weren't sampled at all in the much shorter v2.01 run. 

> The 
> profile also shows that the program is a lot slower in v2.01, take a look 
> at the number of calls to UpdateWorld (the main rendering function).

I don't see what bothers you here.  UpdateWorld is the only function 
which has almost the same time per call in both versions.  Its number of 
calls (18 in v2.01 as opposed to 151 in v2.0) is consistent with the fact 
that the total runtime of the v2.01 version was 9 times shorter than that 
of the v2.0 version.


To sum up:

	1) The two versions were run for a short (and unequal) time which 
might make the profile inexact.  Please run both versions for much longer 
times and post the profiles then.

	2) If anything, the profile indicates that the v2.01 version is 
*faster*, not slower.  Please explain why do you think that the v2.01 
version is ``much slower''.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019