Date: Thu, 4 Feb 1999 22:38:22 -0500 Message-Id: <199902050338.WAA19883@envy.delorie.com> From: DJ Delorie To: djgpp AT delorie DOT com In-reply-to: <36BA52DD.98F5C749@mpx.com.au> (message from Daniel on Fri, 05 Feb 1999 13:09:33 +1100) Subject: Re: Question about long long math on intel archs References: <010501be5064$215c4780$1e2d87ca AT default> <36BA52DD DOT 98F5C749 AT mpx DOT com DOT au> Reply-To: djgpp AT delorie DOT com > PS: i hope nobody was compiling this code with optimisation on. > when i do that everything is much MUCH faster and long and long-long > ops take exactly the same amount of time. I assume this is becuz > gcc see's the multiplications as useless, since we never use the > values of foo and bar. The way to avoid this is to define your benchmark function as a global function that uses global variables, like this: long a1, b1, c1, a2, b2, c2; void benchmark() { a1 = b1 * c1; a2 = b2 * c2; } Make sure your benchmark function is defined *after* your testing function, so that it won't be inlined. You can write a similar function that is exactly the same except that it does no multiplies, and use that as a baseline to measure the overhead, which you then subtract from your overall timings, leaving just the times for the multiplies themselves. Inline a whole bunch of multiplies, like 10 or 20, to reduce the effects of the testing overhead (loops, function calls). Use a macro to ease use, if you can figure out the ANSI string pasting syntax. Another option is to use a loop that multiplies elements of two arrays and stores in a third. You *do* want to optimize when testing stuff like this. I suspect it would make a big difference in the ratios. Based on the code gcc is generating, I'd expect long long multiplies to take a little more than three times longer than long multiplies, since it's using three multiply opcodes and some adds.