Sender: wolfi AT netsurf213 DOT neuss DOT netsurf DOT de Message-ID: <391856CE.6BE0606B@neuss.netsurf.de> Date: Tue, 09 May 2000 20:19:58 +0200 From: Wolfgang Formann X-Mailer: Mozilla 4.6 [en] (X11; I; Linux 2.2.8 i586) X-Accept-Language: German, de, en MIME-Version: 1.0 To: pgcc AT delorie DOT com Subject: Re: pgcc and egcs alignment -- function, basic block and string References: <20000130211158 DOT D641 AT cerebro DOT laendle> <20000203131955 DOT D12247 AT atrey DOT karlin DOT mff DOT cuni DOT cz> <389C6000 DOT 5B79248 AT neuss DOT netsurf DOT de> <3917AF5A DOT FF5C82B2 AT neuss DOT netsurf DOT de> <20000509131501 DOT B27958 AT atrey DOT karlin DOT mff DOT cuni DOT cz> Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Reply-To: pgcc AT delorie DOT com Jan, sure not, I changed the sequence (output from gas) 116 0080 0FB6CE movzbl %dh, %ecx 117 118 0083 333C9D00 xorl some_mem(,%ebx,4),%edi 118 180000 119 008a 333C8D00 xorl some_mem(,%ecx,4),%edi 119 0C0000 120 0091 88D3 movb %dl, %bl to 116 0080 88F1 movb %dh, %cl 117 118 0082 333C9D00 xorl some_mem(,%ebx,4),%edi 118 180000 119 0089 333C8D00 xorl some_mem(,%ecx,4),%edi 119 0C0000 120 0090 0FB6DA movzbl %dl, %ebx nothing else was done, this gave a speedup of about 2 cpu-cycles. Well, one cycle might be caused by other things like different parallelism caused by the first change. I also tried to insert a single nop before the start of the loop, causing every possible instruction to cross the 16-byte alignment. Again, two additional cycles (the loop is executed about 400 times, so I think I can totally ignore the addition time to decode/execute the nop instruction). Since nothing else was changes, even the overall code length is the same, and the times are reproduceable, the only reason is that Athlon has the same problems as my good old K6 :-( Or at least similar problems. Wolfgang Jan Hubicka wrote: > > > Jan, > > > > seems to be the same with Athlon, at least with this one > > vendor_id : AuthenticAMD > > cpu family : 6 > > model : 2 > > model name : AMD Athlon(tm) Processor > > stepping : 1 > > cpu MHz : 698.660058 > > > > here again, I got some speedups when I rearranged the code to have no > > instructions crossing any 16byte border. > OK. I wilask AMD about this issue. Alex from AMD claims, that Athlon donīt > have such problems. It is well possible the the speedups are caused by some > accidental change elsewhere... > > Honya > >