Date: Thu, 4 Feb 1999 22:38:22 -0500
Message-Id: <199902050338.WAA19883@envy.delorie.com>
From: DJ Delorie <dj AT delorie DOT com>
To: djgpp AT delorie DOT com
In-reply-to: <36BA52DD.98F5C749@mpx.com.au> (message from Daniel on Fri, 05
	Feb 1999 13:09:33 +1100)
Subject: Re: Question about long long math on intel archs
References: <010501be5064$215c4780$1e2d87ca AT default> <36BA52DD DOT 98F5C749 AT mpx DOT com DOT au>
Reply-To: djgpp AT delorie DOT com


> PS: i hope nobody was compiling this code with optimisation on.
> when i do that everything is much MUCH faster and long and long-long
> ops take exactly the same amount of time.  I assume this is becuz
> gcc see's the multiplications as useless, since we never use the
> values of foo and bar.

The way to avoid this is to define your benchmark function
as a global function that uses global variables, like this:

long a1, b1, c1, a2, b2, c2;
void benchmark()
{
  a1 = b1 * c1;
  a2 = b2 * c2;
}

Make sure your benchmark function is defined *after* your testing
function, so that it won't be inlined.  You can write a similar
function that is exactly the same except that it does no multiplies,
and use that as a baseline to measure the overhead, which you then
subtract from your overall timings, leaving just the times for the
multiplies themselves.

Inline a whole bunch of multiplies, like 10 or 20, to reduce the
effects of the testing overhead (loops, function calls).  Use a macro
to ease use, if you can figure out the ANSI string pasting syntax.

Another option is to use a loop that multiplies elements of two arrays
and stores in a third.

You *do* want to optimize when testing stuff like this.  I suspect it
would make a big difference in the ratios.

Based on the code gcc is generating, I'd expect long long multiplies
to take a little more than three times longer than long multiplies,
since it's using three multiply opcodes and some adds.