From: Elliott Oti Newsgroups: comp.os.msdos.djgpp Subject: Re: Optimization Date: Wed, 27 Nov 1996 07:49:10 -0800 Organization: Academic Computer Centre Utrecht, (ACCU) Lines: 62 Message-ID: <329C62F6.23F6@stud.warande.ruu.nl> References: <57hg9b$or5 AT kannews DOT ca DOT newbridge DOT com> <329C4CD4 DOT 7474 AT cornell DOT edu> NNTP-Posting-Host: warande1078.warande.ruu.nl Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Glen Miner wrote: > > > > djgpp? I mean, right now I have a 9x9 array of bytes, and try to index > > > all > > > > you might be better off just using ints (32 bits) instead of 8, or > > worse, 16 bit quantities. > > What makes you say that? I can't see how this would make it faster... > more cache misses, and an extra shift to index non-byte sized quantities. > Not to mention the fact that there are more byte sized registers. I believe in 32-bit protected mode most dword register ops are faster than the equivalent 16-bit ones on a 486 and above. Certainly on a P6 16-bit instructions are disproportionately slow. In any case I haven't seen djgpp generate any optimizations which utilise the byte registers; AFAIK it uses them only in straightforward byte ops. > > > Since this code _is_ a port, I have noticed some data conversion too. > An > > > int is no longer 16 bits long. Since I don't really need a 32 bit int > in > > > the core code would it help at all for me to change things to "short > > > unsigned int"s? Unless you *need* the memory don't bother. Offhand, though, I would say you can hardly expect a 100%-200% speed increase simply from switching data types; for that type of increase you really need to look at the algorithm and the way it's implemented. > > did you actually profile your code to see where the bottlenecks are? > > Yes. I know exactly where I need to improve. I have no idea how good your C coding skills are, so don't be offended, but careful C code can speed up a sloppy implementation by ~ 100%: on the other hand, there are limits. Check your algorithm to see what basic operations are being used (specifically multiplies, divides, sqrts etc) and check how many operations are duplicated in such a way that they can be removed with a little recoding - e.g a1 = b1/(x*y); c = x*y; a2 = b2/(x*y); ===> a1 = b1/c; a2 = b1/c etc. a3 = b3/(x*Y); Simplistic, but you get the point. Check if a series of operations can be replaced by a sequence of differential ops - eg for(i=0;i<100;i++)a[320*i + x] = 5; becomes for(i=x;i<32000 + x;i+=320)a[i] = 5 (saves 100 multiplies) and so on. Other optimizations: loop unrolling, hauling unneccessary conditional branching out of loops and using multiple loops, one for each branch,data caching,lookup tables of the most commonly done ops. If you already did all this you're probably close to the limits a C implementation of your algorithm can provide. Time to reexamine the algorithm itself. Confucius he say "All optimizations are but naught, if your algorithm sucks". If you didn't know all this, post ( or email) the pieces of code that need doing and I'll have a go at it. Elliott