Date: Thu, 27 Feb 1997 18:42:45 +0200 (IST) From: Eli Zaretskii To: Jesse Bennett cc: djgpp AT delorie DOT com Subject: Re: Netlib code [was Re: flops...] In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII On Wed, 26 Feb 1997, Jesse W. Bennett wrote: To: jbennett AT ti DOT com CC: jbennett AT ti DOT com, djgpp AT delorie DOT com In-reply-to: (jesse AT lenny DOT dseg DOT ti DOT com) Subject: Re: Netlib code [was Re: flops...] --text follows this line-- > I tried this on a Linux box with gcc 2.6.3 and 2.7.2 and the results were > encouraging, but the pointer based code was still slightly faster. Did you try to experiment with the various optimization-related switches to gcc? There are a plethora of them, all described in section called "Optimize Options" of the gcc on-line docs. I suggest to try those which seem relevant to your inner loops, looking at the generated assembly and timing the results, until you find the best combination. > L13: > movl (%edi),%edx > movl (%esi),%eax > fld %st(0) > fmull (%eax,%ecx,8) > faddl (%edx,%ecx,8) > fstpl (%edx,%ecx,8) > incl %ecx > cmpl %ecx,12(%ebp) > jg L13 > > It is not clear to me why the edx and eax registers are being reloaded > each iteration. Maybe because GCC allows `a' or `b' to be the same as `c' at the caller side? Try declaring `a' and `b' const and see if that helps.