Date: Tue, 16 Mar 1999 23:59:24 +0100 To: pgcc AT delorie DOT com Subject: Re: Benchmarks for floating point operations Message-ID: <19990316235924.C21166@cerebro.laendle> Mail-Followup-To: pgcc AT delorie DOT com References: <19990316203348 DOT A25705 AT physik DOT fu-berlin DOT de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <19990316203348.A25705@physik.fu-berlin.de>; from Axel Thimm on Tue, Mar 16, 1999 at 08:33:48PM +0100 X-Operating-System: Linux version 2.2.3 (root AT cerebro) (gcc driver version pgcc-2.93.09 19990221 (gcc2 ss-980929 experimental) executing gcc version 2.7.2.3) From: Marc Lehmann Reply-To: pgcc AT delorie DOT com X-Mailing-List: pgcc AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On Tue, Mar 16, 1999 at 08:33:48PM +0100, Axel Thimm wrote: > We are currently trying to see what we can drain maximally from PII for a > certain flop intensive application (QCD). Until now folks were using gcc 2.8.1 > with -O2 -fomit-frame-pointer. I thought I might surprise them with egcs or > pgcc, but the perfomance dropped from 80 to 50 Mflop/s (?) this can be related to a variety of factors, some are out of the scope of the compiler (it warrants a whole book of its own). Here are the two most prominent problems. - double alignment. depending on how your program allocates memory for doubles, it can, by pure luck, change from optimal to non-optimal. - cache colouring (or lack thereof). Sometimes moving around data structures will defter performance randomly (from run to run). some algorithms are highly sensitive to these. Unfortunately, the compiler cannot help here. Also, which os are you using, and which libc (if on linux?) Most x86 operating systems don't align the stack to an 8 byte boundary, which makes it luck again if the code runs fast or slow. Also, others have pointed out higher optimization levels that help in an unrelated way. you might also want to try -malign-double (and hope your libraries work with that switch). It will align all doubles in structures correctly (that rarely improves performanc,e but when it does, its by some 30% or more). > > [This was pgcc 1.1, as I cannot compile any newer snapshot/CVS, see related > mail in this list] I don't htink it is related to that version (regardless of what I said below). > > Now I know of gcc to egcs regression, but I thought that pgcc was atop of both There is no realy regression regarding technology, though. Unlike gcc, the releases have disabled more optimization than necessary, to be as stable as possible (more stable than say gcc-2.8). The current snapshots both are faster on average than gcc. > Is this a known fact? Have others made similar experiences? The program is x86 fp performance is veeery sensitive to environment issues. > memory intensive (small ratio of computations per memory accesses) and perhaps > this is what makes the difference. It might. Cahce line aliasing can make up to 200% difference in runtime. -- -----==- | ----==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / pcg AT goof DOT com |e| -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | |