Date: Tue, 29 Feb 2000 20:39:12 +0100
From: Martin Ockajak <mandos AT hq DOT alert DOT sk>
To: pgcc AT delorie DOT com
Subject: Re: short add stuff
Message-ID: <20000229203912.A4098@hq.alert.sk>
References: <38B426AF DOT 280BF1C0 AT sgi DOT com> <20000224173809 DOT A32390 AT hq DOT alert DOT sk> <38B60A5F DOT D83E8C12 AT sgi DOT com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="GvXjxJ+pjyke8COw"
X-Mailer: Mutt 1.0i
In-Reply-To: <38B60A5F.D83E8C12@sgi.com>; from law@sgi.com on Thu, Feb 24, 2000 at 08:51:43PM -0800
Reply-To: pgcc AT delorie DOT com
Errors-To: dj-admin AT delorie DOT com
X-Mailing-List: pgcc AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com
Precedence: bulk


--GvXjxJ+pjyke8COw
Content-Type: text/plain; charset=us-ascii

On Thu, Feb 24, 2000 at 08:51:43PM -0800, Linda Walsh wrote:
> I dump the assembly code and get these results on SuSE:
> Reading specs from /usr/lib/gcc-lib/i486-linux/egcs-2.91.66/specs
> gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) 

There is no need specify the distributions. Relevant information is gcc
(pgcc,egcs) version and sometimes versions of external utils which gcc use
(binutils etc.). 

> gcc version 2.95.2 19991024 (release)    
> For pentium, I get the .4 alignment and the movw.
> 386: .align 4, and the movzwl (?!)
> 486: .align 16, and back to the 'movw'
> pentiumpro: same as 386, .4 and movzwl
> ---
> 	My timing tests were all on a P-III.  When optimizing for
> the pentpro, it *slowed* down by 25%.  So it appears the "movzwl"
> instruction is slower on a PIII and maybe a PII.  It's hard to imagine
> a movw, which does less actually being slower. in any circumstance.
> 	So if you notice, no xor's are needed, so zeroing the high
> portion would seem to be just a waste of time unless the movzwl is actually
> faster than a movw (?).

As Wolfgang pointed out, some x86 CPUs are pretty sensitive to
change operand size / change adress size prefixes. This is probably
the reason why the movws aren't effective on such CPUs. On Pentium Pro / 
II / III movxx takes only one one micro-op and decodes in the the
simple decoders, so it isn't slower than single movew.

> I'm just upset that the aim9 benchmark wasn't faster across the board
> with pentiumpro optimization on a PIII or a PII -- I checked both and
> the "-mpentium" optimization slowed down short integer addition
> significantly over the -m486, so I'm just trying to track down the
> source of the problem.

The problem isn't AFAIK in types of insns gcc uses, but mostly in
absence of good low-level insns reordering pass (in (p)gcc 2.95.x).
(Fluent decoding seems to be _very_ important to the latest CPUs.)
In comparison, there are special reordering in passes for both Pentium
and Pentium Pro / II / III in actual development snapshots of gcc 2.96.
These greatly improve performance. (For example: I noticed roughly 15%
performance improvement on Pentium II with -mcpu=i686 when compared to
-mcpu=i586)


Regards
-- 
Martin Ockajak a.k.a. Mandos  <mandos AT hq DOT alert DOT sk>  http://hq.alert.sk/~mandos
"The goal of Computer Science is to build something that will last at
least until we've finished building it."

--GvXjxJ+pjyke8COw
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.0 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAji8IGAACgkQ04YFujOC4BPOIACghGTXIENnh4TruU9yvbCta7bU
C10AnjANSfBTg8udkUNU3AbTlVXORDOh
=dGKZ
-----END PGP SIGNATURE-----

--GvXjxJ+pjyke8COw--