www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/02/27/11:54:57

Date: Thu, 27 Feb 1997 18:42:45 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
To: Jesse Bennett <jbennett AT ti DOT com>
cc: djgpp AT delorie DOT com
Subject: Re: Netlib code [was Re: flops...]
In-Reply-To: <Pine.LNX.3.91.970226105830.29585A-100000@lenny.dseg.ti.com>
Message-ID: <Pine.SUN.3.91.970227184200.2124C-100000@is>
MIME-Version: 1.0

On Wed, 26 Feb 1997, Jesse W. Bennett wrote:

To: jbennett AT ti DOT com
CC: jbennett AT ti DOT com, djgpp AT delorie DOT com
In-reply-to: <Pine DOT LNX DOT 3 DOT 91 DOT 970226105830 DOT 29585A-100000 AT lenny DOT dseg DOT ti DOT com>
	(jesse AT lenny DOT dseg DOT ti DOT com)
Subject: Re: Netlib code [was Re: flops...]
--text follows this line--

> I tried this on a Linux box with gcc 2.6.3 and 2.7.2 and the results were
> encouraging, but the pointer based code was still slightly faster.

Did you try to experiment with the various optimization-related
switches to gcc?  There are a plethora of them, all described in
section called "Optimize Options" of the gcc on-line docs.  I suggest
to try those which seem relevant to your inner loops, looking at the
generated assembly and timing the results, until you find the best
combination.

> L13:
>         movl (%edi),%edx
>         movl (%esi),%eax
>         fld %st(0)
>         fmull (%eax,%ecx,8)
>         faddl (%edx,%ecx,8)
>         fstpl (%edx,%ecx,8)
>         incl %ecx
>         cmpl %ecx,12(%ebp)
>         jg L13
> 
> It is not clear to me why the edx and eax registers are being reloaded 
> each iteration.

Maybe because GCC allows `a' or `b' to be the same as `c' at the
caller side?  Try declaring `a' and `b' const and see if that helps.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019