Date: Wed, 5 Nov 1997 19:45:44 -0800 (PST) Message-Id: <199711060345.TAA03125@adit.ap.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" To: Alistair Bain , djgpp AT delorie DOT com From: Nate Eldredge Subject: Re: Code optimisations Precedence: bulk At 11:27 11/4/1997 +0000, Alistair Bain wrote: >Ok don't want to start a war here but... >I'm new-ish to C and I am looking for ways to speed up my code. >I don't care a whit about readability and maitainability of code, or >the size of code generated (As long as it runs on clean 16Mb system). I >just want the best performance poss. (for games programming). Okay, but... if you're "new-ish" to C, focusing on performance over maintainability might not be such a good idea. Just a suggestion. > > >1) structs - Are they slow? what do they compile to? No, not particularly. The compiler knows the offset of each member of the struct, so something like this: foo.x = 3; compiles to (pseudocode) mov [foo+offsetof(x)],3 A member/dereference (you know, the `->' operator) may be slightly slower than a straight pointer dereference, because it must add the member offset to the pointer, then move indirect. But the compiler can often optimize these away... you might be surprised. > >2) unrolling loops - to what extent does -funroll-loops do this? Does - >O3 do this? Adding both I get an excecutable of exactly the same size as >using just -O. Should I just manually unroll the loops? `-funroll-loops' unrolls all the loops where "the number of iterations is known at compile time" (GCC manual). Things like this: for (i=0; i<1000; i++) /* do something */; So perhaps your loop wasn't so precisely determined. `-funroll-all-loops' unrolls even loops where it doesn't know how many iterations, but this is generally a Bad Thing, since there is still a test and conditional jump after each iteration, and the code just gets huge and exceeds caches. `-O3' does not unroll loops. If you read the GCC manual, it says that: * `-O' performs the optimizations that have the most effect for the time they take; * `-O2' performs almost all optimizations that don't trade-off size for speed; * `-O3' is the same as `-O2' but also inlines functions when possible. Also, I seem to remember something about a bug causing crashes or incorrect code using loop unrolling. IMO, it doesn't tend to be a win anyway. > >3) global variables - I heard, read, dreamt something about them being >faster. ie declare all variables at top of prog and just be careful >about accessing wrong ones, etc. Not that I know of. An access to a global variable compiles to: mov [12345],42 ; where 12345 is the variable's address while a local access becomes: mov [ebp-17],42 ; where the variable is 17 bytes into the stack frame My 386 manual gives identical instruction timings for both forms, and AFAIK this hasn't changed on newer chips. In fact, local variables may even be stored in registers and become faster yet. I've found that the best options for optimizing are: -O3, to make the compiler work hard; -fomit-frame-pointer, unless you need to debug your code -ffast-math, unless you use strict ANSI/IEEE floating point -m486 assuming you have a 486 or better Here are some tips on code optimizing that I saw somewhere: * First, profile your code and find what needs improving; it's often not what you think. * A good algorithm is likely to make the most difference in speed. * Tricks in rearranging syntax to get the compiler to make better code rarely help. Hope this helps! Nate Eldredge eldredge AT ap DOT net