Date: Tue, 29 Feb 2000 20:39:12 +0100 From: Martin Ockajak To: pgcc AT delorie DOT com Subject: Re: short add stuff Message-ID: <20000229203912.A4098@hq.alert.sk> References: <38B426AF DOT 280BF1C0 AT sgi DOT com> <20000224173809 DOT A32390 AT hq DOT alert DOT sk> <38B60A5F DOT D83E8C12 AT sgi DOT com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="GvXjxJ+pjyke8COw" X-Mailer: Mutt 1.0i In-Reply-To: <38B60A5F.D83E8C12@sgi.com>; from law@sgi.com on Thu, Feb 24, 2000 at 08:51:43PM -0800 Reply-To: pgcc AT delorie DOT com Errors-To: dj-admin AT delorie DOT com X-Mailing-List: pgcc AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk --GvXjxJ+pjyke8COw Content-Type: text/plain; charset=us-ascii On Thu, Feb 24, 2000 at 08:51:43PM -0800, Linda Walsh wrote: > I dump the assembly code and get these results on SuSE: > Reading specs from /usr/lib/gcc-lib/i486-linux/egcs-2.91.66/specs > gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) There is no need specify the distributions. Relevant information is gcc (pgcc,egcs) version and sometimes versions of external utils which gcc use (binutils etc.). > gcc version 2.95.2 19991024 (release) > For pentium, I get the .4 alignment and the movw. > 386: .align 4, and the movzwl (?!) > 486: .align 16, and back to the 'movw' > pentiumpro: same as 386, .4 and movzwl > --- > My timing tests were all on a P-III. When optimizing for > the pentpro, it *slowed* down by 25%. So it appears the "movzwl" > instruction is slower on a PIII and maybe a PII. It's hard to imagine > a movw, which does less actually being slower. in any circumstance. > So if you notice, no xor's are needed, so zeroing the high > portion would seem to be just a waste of time unless the movzwl is actually > faster than a movw (?). As Wolfgang pointed out, some x86 CPUs are pretty sensitive to change operand size / change adress size prefixes. This is probably the reason why the movws aren't effective on such CPUs. On Pentium Pro / II / III movxx takes only one one micro-op and decodes in the the simple decoders, so it isn't slower than single movew. > I'm just upset that the aim9 benchmark wasn't faster across the board > with pentiumpro optimization on a PIII or a PII -- I checked both and > the "-mpentium" optimization slowed down short integer addition > significantly over the -m486, so I'm just trying to track down the > source of the problem. The problem isn't AFAIK in types of insns gcc uses, but mostly in absence of good low-level insns reordering pass (in (p)gcc 2.95.x). (Fluent decoding seems to be _very_ important to the latest CPUs.) In comparison, there are special reordering in passes for both Pentium and Pentium Pro / II / III in actual development snapshots of gcc 2.96. These greatly improve performance. (For example: I noticed roughly 15% performance improvement on Pentium II with -mcpu=i686 when compared to -mcpu=i586) Regards -- Martin Ockajak a.k.a. Mandos http://hq.alert.sk/~mandos "The goal of Computer Science is to build something that will last at least until we've finished building it." --GvXjxJ+pjyke8COw Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.0 (GNU/Linux) Comment: For info see http://www.gnupg.org iEYEARECAAYFAji8IGAACgkQ04YFujOC4BPOIACghGTXIENnh4TruU9yvbCta7bU C10AnjANSfBTg8udkUNU3AbTlVXORDOh =dGKZ -----END PGP SIGNATURE----- --GvXjxJ+pjyke8COw--