www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1996/11/28/01:07:30

From: Elliott Oti <e DOT oti AT stud DOT warande DOT ruu DOT nl>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Optimization
Date: Wed, 27 Nov 1996 07:49:10 -0800
Organization: Academic Computer Centre Utrecht, (ACCU)
Lines: 62
Message-ID: <329C62F6.23F6@stud.warande.ruu.nl>
References: <57hg9b$or5 AT kannews DOT ca DOT newbridge DOT com> <329C4CD4 DOT 7474 AT cornell DOT edu> <Pine DOT SUN DOT 3 DOT 90 DOT 961127095705 DOT 25056B-100000 AT coop10>
NNTP-Posting-Host: warande1078.warande.ruu.nl
Mime-Version: 1.0
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

Glen Miner wrote:
> 
> > > djgpp? I mean, right now I have a 9x9 array of bytes, and try to index
> > > all
> >
> > you might be better off just using ints (32 bits) instead of 8, or
> > worse, 16 bit quantities.
> 
> What makes you say that? I can't see how this would make it faster...
> more cache misses, and an extra shift to index non-byte sized quantities.
> Not to mention the fact that there are more byte sized registers.

I believe in 32-bit protected mode most dword register ops are faster
than the equivalent 16-bit ones on a 486 and above. Certainly on a P6
16-bit instructions are disproportionately slow.  
In any case I haven't seen djgpp generate any optimizations which utilise
the byte registers; AFAIK it uses them only in straightforward byte ops.

> > > Since this code _is_ a port, I have noticed some data conversion too. > An
> > > int is no longer 16 bits long. Since I don't really need a 32 bit int > in
> > > the core code would it help at all for me to change things to "short
> > > unsigned int"s?

Unless you *need* the memory don't bother. Offhand, though, I would say 
you can hardly expect a 100%-200% speed increase simply from switching
data types; for that type of increase you really need to look at the 
algorithm and the way it's implemented.

> > did you actually profile your code to see where the bottlenecks are?
> 
> Yes. I know exactly where I need to improve.

I have no idea how good your C coding skills are, so don't be offended,
but careful C code can speed up a sloppy implementation by ~ 100%:
on the other hand, there are limits.
Check your algorithm to see what basic operations are being used
(specifically multiplies, divides, sqrts etc) and check how many
operations are duplicated in such a way that they can be removed with 
a little recoding -
e.g    a1 = b1/(x*y);            c = x*y;  
       a2 = b2/(x*y);   ===>     a1 = b1/c; a2 = b1/c etc.
       a3 = b3/(x*Y);
Simplistic, but you get the point.
Check if a series of operations can be replaced by a sequence of differential
ops -
eg
 for(i=0;i<100;i++)a[320*i + x] = 5;
becomes
 for(i=x;i<32000 + x;i+=320)a[i] = 5   (saves 100 multiplies)
and so on.

Other optimizations: loop unrolling, hauling unneccessary conditional
branching out of loops and using multiple loops, one for each branch,data
caching,lookup tables of the most commonly done ops.

If you already did all this you're probably close to the limits a C implementation
of your algorithm can provide. Time to reexamine the algorithm itself.
Confucius he say "All optimizations are but naught, if your algorithm sucks".
If you didn't know all this, post ( or email) the pieces of code that need
doing and I'll have a go at it.
 
Elliott

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019