From: kagel AT quasar DOT bloomberg DOT com Date: Wed, 27 Nov 1996 13:58:29 -0500 Message-Id: <9611271858.AA11183@quasar.bloomberg.com > To: gminer AT Newbridge DOT COM Cc: djgpp AT delorie DOT com In-Reply-To: <57hg9b$or5@kannews.ca.newbridge.com> (gminer@Newbridge.COM) Subject: Re: Optimization Reply-To: kagel AT dg1 DOT bloomberg DOT com Errors-To: postmaster AT ns1 From: gminer AT Newbridge DOT COM (Glen Miner) Newsgroups: comp.os.msdos.djgpp Date: 27 Nov 1996 13:38:51 GMT Organization: Newbridge Networks Corporation Lines: 30 Nntp-Posting-Host: 138.120.136.238 Dj-Gateway: from newsgroup comp.os.msdos.djgpp Content-Type: text Content-Length: 1489 I am currently working on a program that grinds away in a recursive algorithm tens of thousands of times over a small group of local data. It doesn't do _any_ io, and only calls a select few functions. Due to the immense number of calculations required, this is painfully slow. I've optimized the algorithm a fair amount, and plan to improve it more, but that is only one part of the optimization process. I am currently using the -O3 compile option; is there any other switches I should be enabling or disabling? I never need to debug the code, so symbol information is not an issue. Can anyone suggest any hard and fast rules for c code optimization under djgpp? I mean, right now I have a 9x9 array of bytes, and try to index all of the data in it with pointers. Is this necessary, or is x=MyArray[4][6] just as good? Since this code _is_ a port, I have noticed some data conversion too. An int is no longer 16 bits long. Since I don't really need a 32 bit int in the core code would it help at all for me to change things to "short unsigned int"s? I may end up doing the heavily used functions in assembler, but I'd like to make sure that the c code is as optimal as it can be first. Any comments, suggestions or pointers are welcomed. The following compiles and runs were performed with GCC on DG Aviion. I have found similar results in DJGPP on similar test in the past though I am not at home now to try it. Running the following test program compiled various ways gives results that verify that the GCC optimizer does a better job than you can: These compiles leave register allocation to the compiler: -------------------------------------------------------- gcc -o tst.D tst.c gcc -o tst.O -O tst.c gcc -o tst.O2 -O2 tst.c gcc -o tst.O3 -O3 tst.c These compiles declare register variables explicitly: ---------------------------------------------------- gcc -o tst.DR tst.c -Dusereg gcc -o tst.OR -O tst.c -Dusereg gcc -o tst.O2R -O2 tst.c -Dusereg gcc -o tst.O3R -O3 tst.c -Dusereg Versions ranked by Array Indexing results worst to best: Opt. Manual Executable Level Regist. Array Indexing Pointer Deref. ------------- ----- ------ ------------------- ------------------ tst.D None No 5.752117 4.543308 tst.DR None Yes 1.621305 1.192638 tst.O -O No 0.801081 0.844428 tst.OR -O Yes 0.658607 0.869502 tst.O2R -O2 Yes 0.523563 0.642418 tst.O3R -O3 Yes 0.549404 0.718574 tst.O2 -O2 No 0.499495 0.676681 tst.O3 -O3 No 0.484081 0.709432 Clearly -O3 optimizations using array indexing and optimizer determined register allocation is the fastest scheme. This shows several points: 1) Using array indexing is superior to using pointers when any level of optimization is used. 2) Manual register allocation is only helpful if no optimization or -O optimization is used! 3) Manual register allocation is most helpful with variables used in the array indexing than pointers. ############################# tst.c ################################## #include #include #include #include #include #if defined(usereg) #define REG register #else #define REG #endif #if !defined(ITERS) #define ITERS 100000 #endif void print_diff( char *str, struct timeval st, struct timeval en ); main( int argc, char **argv) { char arry[9][9]; REG long i, j, k; REG char *p; struct timeval start, end; gettimeofday( &start, NULL ); for( i=0; i