www.delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1999/05/20/02:37:17

Sender: jur AT rz DOT hu-berlin DOT de
Message-ID: <3743ADE8.C938ADBB@informatik.hu-berlin.de>
Date: Thu, 20 May 1999 06:38:32 +0000
From: Jens-Uwe Rumstich <rumstich AT informatik DOT hu-berlin DOT de>
Organization: TUSCON
X-Mailer: Mozilla 4.05 [en] (X11; I; Linux 2.2.8 i586)
MIME-Version: 1.0
To: pgcc AT delorie DOT com
Subject: Re: Benchmark PGCC vs EGCS on a K6-2
References: <373F3AA2 DOT A446D611 AT informatik DOT hu-berlin DOT de> <Pine DOT LNX DOT 4 DOT 10 DOT 9905181826020 DOT 1284-100000 AT data DOT mandrakesoft DOT com> <19990519105631 DOT 40676 AT atrey DOT karlin DOT mff DOT cuni DOT cz>
Reply-To: pgcc AT delorie DOT com

Hi!

First, please donīt trust the numbers I posted at all. The switches
"pgcc -mk6 -O3" and 
"pgcc -mk6 -O4" produce the same executable, but the results had 3
seconds difference. Too
much to call these results reliable :-((

> About year ago I've done some tunning of egcs for K6-2. I've removed some of
> K6-2 specific optimizations, because they seemed to produce slower code. There
> seems to be important problem in K6 documentation. It recommends thinks that often
> causes performance loss. Author of original K6 stuff for egcs just blindly followed
> their recommendations so many of his changes were performance miss (especially changking
> xor reg,reg to mov reg,0)

ooops... The mov is not faster?

> Many (not all) of this changes are in recent egcs snapshots (aka gcc 2.95.0). Because
> I don't have any access to this CPU anymore, I would love to hear about your results with
> this version of gcc.

Iīll try them out and write about them. 
Do you know a way to get exact numbers? I still donīt know, why my
results are that wrong :-(

> K6 seems to have serious problems with decoding speed. I've made new haifa scheduler hooks for
> decoding that worked quite well (I have also version for Pentium and PPro available, PPro
> version is untested),

It seems to me, that the decoders of the K6 are not strong enough to
feed all the execution
units, so this is the bottleneck. One should probably try to output
instructions, which
result in 4 Risc-Ops per cycle. Means 2 short instructions, where each
one is breaken into 2 Risc-Ops or a Long Instruction, which is broken
into 4 RiscOps.
In the PGCC-FAQ I read about an "recombining"-optimization, which seems
to be intended to do exactly this. But it was marked as disabled,
because it may slow down some code...

> On K6 it brought quite large speedups (-10 - 500%, usually about 10%), but changes necesarry
> to i386.md are quite large so it would take lots of time to add them into gcc.

And would it make look even uglier, right ?? ;-)

> Honza

cu
	Jens-Uwe

- Raw text -


  webmaster     delorie software   privacy  
  Copyright Đ 2019   by DJ Delorie     Updated Jul 2019