Date: Tue, 16 Mar 1999 23:59:24 +0100
To: pgcc AT delorie DOT com
Subject: Re: Benchmarks for floating point operations
Message-ID: <19990316235924.C21166@cerebro.laendle>
Mail-Followup-To: pgcc AT delorie DOT com
References: <19990316203348 DOT A25705 AT physik DOT fu-berlin DOT de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <19990316203348.A25705@physik.fu-berlin.de>; from Axel Thimm on Tue, Mar 16, 1999 at 08:33:48PM +0100
X-Operating-System: Linux version 2.2.3 (root AT cerebro) (gcc driver version pgcc-2.93.09 19990221 (gcc2 ss-980929 experimental) executing gcc version 2.7.2.3) 
From: Marc Lehmann <pcg AT goof DOT com>
Reply-To: pgcc AT delorie DOT com
X-Mailing-List: pgcc AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com
Precedence: bulk

On Tue, Mar 16, 1999 at 08:33:48PM +0100, Axel Thimm wrote:
> We are currently trying to see what we can drain maximally from PII for a
> certain flop intensive application (QCD). Until now folks were using gcc 2.8.1
> with -O2 -fomit-frame-pointer. I thought I might surprise them with egcs or
> pgcc, but the perfomance dropped from 80 to 50 Mflop/s (?)

this can be related to a variety of factors, some are out of the scope of the
compiler (it warrants a whole book of its own). Here are the two most
prominent problems.

- double alignment. depending on how your program allocates memory for
  doubles, it can, by pure luck, change from optimal to non-optimal.
- cache colouring (or lack thereof). Sometimes moving around data structures
  will defter performance randomly (from run to run). some algorithms are
  highly sensitive to these. Unfortunately, the compiler cannot help here.

Also, which os are you using, and which libc (if on linux?) Most x86
operating systems don't align the stack to an 8 byte boundary, which makes it
luck again if the code runs fast or slow.

Also, others have pointed out higher optimization levels that help in an
unrelated way.

you might also want to try -malign-double (and hope your libraries work
with that switch). It will align all doubles in structures correctly (that
rarely improves performanc,e but when it does, its by some 30% or more).

> 
> [This was pgcc 1.1, as I cannot compile any newer snapshot/CVS, see related
> mail in this list]

I don't htink it is related to that version (regardless of what I said
below).

> 
> Now I know of gcc to egcs regression, but I thought that pgcc was atop of both

There is no realy regression regarding technology, though. Unlike gcc, the
releases have disabled more optimization than necessary, to be as stable as
possible (more stable than say gcc-2.8). The current snapshots both are
faster on average than gcc.

> Is this a known fact? Have others made similar experiences? The program is

x86 fp performance is veeery sensitive to environment issues.

> memory intensive (small ratio of computations per memory accesses) and perhaps
> this is what makes the difference.

It might. Cahce line aliasing can make up to 200% difference in runtime.

--  
      -----==-                                             |
      ----==-- _                                           |
      ---==---(_)__  __ ____  __       Marc Lehmann      +--
      --==---/ / _ \/ // /\ \/ /       pcg AT goof DOT com      |e|
      -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
    The choice of a GNU generation                       |
                                                         |