Mail Archives: pgcc/1998/01/25/12:15:00

www.delorie.com/archives/browse.cgi

search

Mail Archives: pgcc/1998/01/25/12:15:00

X-POP3-Rcpt: mlehmann AT universe DOT sgh-net DOT de

25 Jan 1998 12:15:00 +0100 (CET) :

From: Ronald Wahl <Ronald DOT Wahl AT Informatik DOT TU-Chemnitz DOT DE>

X-Sender: rwa AT goliath DOT csn DOT tu-chemnitz DOT de

To: Marc Lehmann <pcg AT goof DOT com>

cc: beastium-list AT Desk DOT nl

Subject: Re: PGCC optimizing AMD K6?

In-Reply-To: <19980125021449.38760@cerebro.laendle>

Message-ID: <Pine.LNX.3.96.980125115906.11117A-100000@goliath.csn.tu-chemnitz.de>

MIME-Version: 1.0

Sender: Marc Lehmann <pcg AT goof DOT com>

Status: RO

X-Status: A

Lines: 55

On Sun, 25 Jan 1998, Marc Lehmann wrote:
> On Sat, Jan 24, 1998 at 11:50:49PM +0100, Ronald Wahl wrote:
> > Since pgcc-980122 is out, can you verify that -ffast-math
> > (w/o funroll-loops) slows down some integer benches? The neural net ben=
ch
> > still doesn't return if -funroll-loops or -funroll-all-loops is used. H=
as
> > anybody checked if this is a problem of egcs or only pgcc? Maybe we sho=
ul
>=20
> I haven't checked it myself, but it seems to work under egcs..
>=20
> It might be a egcs bug, or maybe a simple incompatibility between egcs &
> pgcc, as you know, I'm debugging that /&$/$% unrolling code since a long
> time..=20

keep on hacking ;-)

> > PPS (for Marc): Since I've seen many fxch instructions in the assembly
> >                 output of nbench I have to note that these will not
> >                 improve performance like on a pentium. If it's possible
> >                 we should remove these. Minimizing the number of fpu
> >                 instructions should be one of the goals on a K6 since
> >                 most of these have a latency of 2 cycles and need two
> >                 cycles to execute.
>=20
> hmm.. that probably makes loop unrolling useless (doing two calculations
> independently requires fxch, due to the =A7%&$%=A7$%E$ x86 fpu architectu=
re)

yes, but actually the code produced by -funroll-loops is faster. Maybe
nbench's fp benches include enough integer code so that loop unrolling
will be a win.

> We should be able to get rid of them by defining no parallelity for the
> fp unit in the .md file,

=2E..but I hope this doesn't mean that integer code cannot run in parallel
with fp code...

> but since no instructions are marked with an attribute to do this, this
> won't have much of an effect.=20

Then we should marc^Hk the relevant instructions. Is there anybody here
who will have a look at it? My time is limited and the .md file is to
huge.

ron

--=20
\ Ronald Wahl --- rwa AT informatik DOT tu-chemnitz DOT de   \
 \ WWW: http://www.tu-chemnitz.de/~row             \
  \ Talk: rwa AT goliath DOT csn DOT tu-chemnitz DOT de            \
   \ PGP key available by finger to my email address \

- Raw text -

webmaster	delorie software privacy
Copyright © 2019 by DJ Delorie	Updated Jul 2019

X-POP3-Rcpt:	mlehmann AT universe DOT sgh-net DOT de
25 Jan 1998 12:15:00 +0100 (CET) :
From:	Ronald Wahl <Ronald DOT Wahl AT Informatik DOT TU-Chemnitz DOT DE>
X-Sender:	rwa AT goliath DOT csn DOT tu-chemnitz DOT de
To:	Marc Lehmann <pcg AT goof DOT com>
cc:	beastium-list AT Desk DOT nl
Subject:	Re: PGCC optimizing AMD K6?
In-Reply-To:	<19980125021449.38760@cerebro.laendle>
Message-ID:	<Pine.LNX.3.96.980125115906.11117A-100000@goliath.csn.tu-chemnitz.de>
MIME-Version:	1.0
Sender:	Marc Lehmann <pcg AT goof DOT com>
Status:	RO
X-Status:	A
Lines:	55