www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1996/12/16/07:51:39

Message-ID: <32B53FBE.1B0A@pobox.oleane.com>
Date: Mon, 16 Dec 1996 13:25:34 +0100
From: Francois Charton <deef AT pobox DOT oleane DOT com>
Organization: CCMSA
MIME-Version: 1.0
To: murray AT southeast DOT net
CC: djgpp AT delorie DOT com
Subject: Re: math optimization
References: <32b26866 DOT 241305048 AT nntp DOT southeast DOT net>

Murray Stokely wrote:
> 
>         This is a little code snippet from a vesa lens effect I coded,
> The diameter and magnification of the lens are adjustable in realtime
> via the +/- keys so I need the lens calculating function as fast as
> possible.  Someone gave me a good tip with my sqrt problem earlier in
> using the difference of squares to take out one of the multiplies but
> I'd like to take it a step further.  This routine is actualy faster
> than I expected it would be on my 486dx4, but still every clock cycle
> counts ;)  I know theres lots of room for improvement, so any tips
> would be appreciated.
> 

Here are a few ideas (which should make your code a bit faster)... 


void calculate_tfm(int diameter, char magnification)
 {
         int a,b,x,y,z,s;
         int y2,x2y2;
         int radius,rad2;
         radius=diameter/2;
 	 /* this is used a lot, let's precalculate */
	 rad2=radius*radius;
         /* save a sqrt() */
         s=abs(rad2 - (magnification * magnification));
         /* use square symetry : 4 times less calculations */
	y2=0;
         for(y=0;y<radius;y++)
		{
		y2+=2*y+1;
                 for(x=0;x<radius;x++)
                 {
/* I' not sure whether these recursive formulae are useful, but I love 
them */
    	         x2y2+=2*x+1;
                 if (x2y2 >= s)
                     {
/* this can be improved : for x or y = 0 some values are calculated too 
many times. maybe also the condition x2y2>=s can be taken out of the loop 
*/

tfm[(y+radius)*diameter+(x+radius)]=(y+radius)*diameter+(x+radius);      
tfm[(-y+radius)*diameter+(x+radius)]=(-y+radius)*diameter+(x+radius);    
tfm[(y+radius)*diameter+(-x+radius)]=(y+radius)*diameter+(-x+radius);    
tfm[(-y+radius)*diameter+(-x+radius)]=(-y+radius)*diameter+(-x+radius);  
                      } else {
 
		         z=round(sqrt(rad2-x2y2));
                         a=round(x*magnification/z);
                         b=round(y*magnification/z);
tfm[(y+radius)*diameter+(x+radius)]=(b+radius)*diameter+(a+radius); 
tfm[(-y+radius)*diameter+(x+radius)]=(-b+radius)*diameter+(a+radius); 
tfm[(y+radius)*diameter+(-x+radius)]=(b+radius)*diameter+(-a+radius); 
tfm[(-y+radius)*diameter+(-x+radius)]=(-b+radius)*diameter+(-a+radius); 
tfm[(y+radius)*diameter+(x+radius)]=(b+radius)*diameter+(a+radius);
                         }
                } // end of for x
         } // end of for y
 }
 
> ( I'll eventualy convert all doubles/floats to 16.16 fixed point, so
> skip that obvious MAJOR optimization )

Do you really need 16.16 : this is low on integer part : especially if 
you have to compute squares, multiplies and things like that...
Isn't 22.10 enough (that's three decimal places), or even 25.7 (two 
decimal places).


Francois

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019