Sender: law AT sgi DOT com Message-ID: <38B60A5F.D83E8C12@sgi.com> Date: Thu, 24 Feb 2000 20:51:43 -0800 From: Linda Walsh X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14 i686) X-Accept-Language: en MIME-Version: 1.0 To: pgcc AT delorie DOT com Subject: Re: short add stuff References: <38B426AF DOT 280BF1C0 AT sgi DOT com> <20000224173809 DOT A32390 AT hq DOT alert DOT sk> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Reply-To: pgcc AT delorie DOT com This was my test routine: short_add() { static short int i,j,k,l; int loop; i=0;j=1;k=2;l=3; for (loop=0; ++loop<=10000;){ i=j+k; j+=l; i+=2; j+=i; } } ---- So all the add's are short ints. I dump the assembly code and get these results on SuSE: Reading specs from /usr/lib/gcc-lib/i486-linux/egcs-2.91.66/specs gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) in the default, 486 version, I get .align 16's and use of movw, for example, for i=j+k, I get: movw j->ax movw k->dx addl edx+ecx->ecx movw cx->i So it performs long addition but uses a word move. If I set the -m386, I get a .4 alignment (expected) and the same instructions. If I set the -mpentium, I get a .4 alignment (unexpected) and the same instructions. If I set the -mpentiumpro, I get .4 alignment (unexpected) and the use of the movzwl for the first 2 moves above. -------- On Mandrake: Reading specs from /usr/lib/gcc-lib/i586-mandrake-linux/2.95.2/specs gcc version 2.95.2 19991024 (release) For pentium, I get the .4 alignment and the movw. 386: .align 4, and the movzwl (?!) 486: .align 16, and back to the 'movw' pentiumpro: same as 386, .4 and movzwl --- From RH:Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/specs gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) 386: .align 4, movw 486: .align 16 movw pent: .align 4, movw pentpro: align 4, movzwl --- My timing tests were all on a P-III. When optimizing for the pentpro, it *slowed* down by 25%. So it appears the "movzwl" instruction is slower on a PIII and maybe a PII. It's hard to imagine a movw, which does less actually being slower. in any circumstance. So is the 486 the only processor that required a .16 alignment? That "movzwl" for 386 on Mandrake would seem to be wrong. Martin Ockajak wrote: > xorl %reg0,%reg0 > movw disp(%reg0),%reg1 > > or single > > movzwl disp(%reg0),%reg1 --- So if you notice, no xor's are needed, so zeroing the high portion would seem to be just a waste of time unless the movzwl is actually faster than a movw (?). > Surely not on Pentium. --- But if the xor isn't needed? I'm just upset that the aim9 benchmark wasn't faster across the board with pentiumpro optimization on a PIII or a PII -- I checked both and the "-mpentium" optimization slowed down short integer addition significantly over the -m486, so I'm just trying to track down the source of the problem. -linda -- Linda Walsh @ SGI | Core Linux - Trust Technology 1200 Crittenden Lane MS:30-3-802 | Voice: (650) 933-5338 Mountain View, CA 94043 | Email: law AT sgi DOT com