Message-ID: <32BDBC44.49C6@pobox.oleane.com>
Date: Sun, 22 Dec 1996 23:55:00 +0100
From: Francois Charton <deef AT pobox DOT oleane DOT com>
Organization: CCMSA
MIME-Version: 1.0
To: djgpp AT delorie DOT com
CC: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
Subject: Re: Is DJGPP that efficient?
References: <Pine DOT SUN DOT 3 DOT 91 DOT 961222095247 DOT 29376V-100000 AT is>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Eli Zaretskii wrote:
> 
> On Fri, 20 Dec 1996, Francois Charton wrote:
> >
> > and by
> >   x2=x*x;
> >   co=1.0+x2*(-0.4999999963 + x2*(0.0416666418 + x2*(-0.0013888397 +
> > x2*(0.0000247609 - x2*0.0000002605))));
> >
>
> Any serious general-purpose fp code cannot assume that the
> argument is between 0 and PI/2, and most of the time of the library
> functions is spent in the so-called argument reduction process (which
> brings the argument to a narrow region around 0 where a simple
> approximation can be used).

Yes. I suggested this function as a way to speed up an application 
which *really* needs it, not as a general purpose replacement to the 
cos() function (good enough). 

However, the [0,PI/2] restriction (which is actually [-PI/2,PI/2] as the 
cosine is an even function) is not so silly, say like a square root which 
would stop at 5... In many applications, it is fairly easy to rewrite 
your code so that it guarantees that the argument stays in the bounds. 
This increases the burden of the programmer, but then gain in speed can 
be worth the bother.

Finally, if my first example was, well just an example, the second one is 
a serious formula, and is very easy to extend to a larger domain: here is 
a "valid everywhere" cos() function, which on my 486DX4, compiled with 
DJGPP -O3 runs about 10-15% faster than the libc and libm cos() 
functions.


#define MYIPI_S2 0.6366197724
#define MYPI_S2 1.57079632679

double mycos(double f)
{
double f2;
double co;
int i1;
i1=(int)(f*MYIPI_S2); 
if(i1&1) i1++;
f-=i1*MYPI_S2;
f2=f*f;
co=1.0+f2*(-0.4999999963 + f2*(0.0416666418 + f2*(-0.0013888397 + 
f2*(0.0000247609 - f2*0.0000002605))));
return ((i1&2)?-co:co);
}

BTW, I noticed one funny thing when working on this example:
if, instead of "doubles", I use floats (32 bit, lower precision, better 
aligned...), mycos() runs slower... It seems to be due to the FPU, which 
loses time converting floats to ints. 

Is it DJGPP specific, or common to any "Intel Inside" (you have been 
warned) machine? 

Regards, and Joyeux Noel
Francois