www.delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1999/06/04/07:27:05

Message-Id: <m10prMP-00021bC@chkw386.ch.pwr.wroc.pl>
Date: Fri, 4 Jun 99 10:38
From: strasbur AT chkw386 DOT ch DOT pwr DOT wroc DOT pl (Krzysztof Strasburger)
To: pgcc AT delorie DOT com
Subject: Re: Pgcc 1.1.3 - bad performance on P6
Reply-To: pgcc AT delorie DOT com

Marc Lehmann <pcg AT goof DOT com> wrote:
>On Wed, Jun 02, 1999 at 09:14:00AM +0000, Krzysztof Strasburger wrote:

>> The obvious remark is: the code produced by pgcc for P6 is suboptimal,
>> but why high optimizations kill the performance instead of improving it? 

>Tuning pgcc for ppro is not yet finished. But I think the bigger effect
>you see is that pgcc is tuned for integer performance. You might want
>to try out the hints in the pgcc faq on improving fp-performance (Yes,
>unfortunately you can not have both at the same time yet).
Double precision variables are already double aligned and there is nothing
more to unroll in the function "gausil". I repeated the test under different
conditions to remove the side effect of the function "main".
Double variables in main have been declared static and main.c has been
compiled with gcc 2.7.2.3 -malign-double. Gausil.c has been compiled
for _pentium_ and different version run on _pentium_ 166 with 2000000 steps
(times averaged for three runs each, on idle machine);
-malign-double -mstack-align-double (for pgcc) -malign-jumps=0 -malign-loops=0
-malign-functions=0 -ffast-math used everywhere
-O5 = -O6 (same code)
1. gcc 2.7.2.3 (-m486, of course ;) -O2 : t=7.21s
2. pgcc 1.1.3 -O4 : t=7.16s 
3. pgcc 1.1.3 -O6 : t=7.26s
So, i repeat, -O5/6 kills the performance on P5, not only on P6.
Let us look at ealier version of pgcc (1.0.3a).
It gave only two different codes : -O2 = -O3 = -O4, -O5 = -O6
4. pgcc 1.0.3a -O(2,3,4) : t=7.05s
5. pgcc 1.0.3a -O(5,6) : t=7.16s
Hmmm... High optimizations always killed FP performance. Old pgcc gave better
FP code, than new - and this is sad. Let us look at the latest snapshot.
Again, -O2 = -O3 = -O4 and -O5 = -O6 (of course, this is not a general rule).
4. pgcc 2.93.03 -O(2,3,4) : t=7.15s
5. pgcc 2.93.03 -O(5,6) : t=7.26s
Eh... It isn't better (in this case only, of course; i had other programs
which were faster with pgcc 2.93.03 than with pgcc 1.1.1/2 or 1.0.3).
The clear winner is the old version of pgcc. I'm going back to it.
I have a cluster of pentiums, which spend about 25% of their time
in the function "gausil".
I really appreciate the work, which EGCS/PGCC teams do _for free_.
Please, don't treat my words as flames or complaining, but i think
that an important part of the compiler goes in the wrong direction.
Many programs benefit from good FP performance (not only scientific
software).

>also, you could try a snapshot (i.e. from cvs). 1.1.x was made more for
>stableness than for performance (Yes, I know 1.1.3 is not the most stable
>release we had).
I tried the cvs server, but the transmission breaks very often, so i still
don't have the cvs version. And pgcc 1.1.3 is the first acceptable of 1.1.x
for me, because fast-math didn't work correctly with earlier versions (and
the latest snapshot).
Krzysztof

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019