Mail Archives: djgpp/1996/11/27/14:17:19
   Errors-To: postmaster AT ns1
   From: gminer AT Newbridge DOT COM (Glen Miner)
   Newsgroups: comp.os.msdos.djgpp
   Date: 27 Nov 1996 13:38:51 GMT
   Organization: Newbridge Networks Corporation
   Lines: 30
   Nntp-Posting-Host: 138.120.136.238
   Dj-Gateway: from newsgroup comp.os.msdos.djgpp
   Content-Type: text
   Content-Length: 1489
   I am currently working on a program that grinds away in a recursive
   algorithm tens of thousands of times over a small group of local data. It
   doesn't do _any_ io, and only calls a select few functions. Due to the
   immense number of calculations required, this is painfully slow. I've
   optimized the algorithm a fair amount, and plan to improve it more, but
   that is only one part of the optimization process. 
   I am currently using the -O3 compile option; is there any other switches I
   should be enabling or disabling? I never need to debug the code, so symbol
   information is not an issue. 
   Can anyone suggest any hard and fast rules for c code optimization under
   djgpp? I mean, right now I have a 9x9 array of bytes, and try to index all
   of the data in it with pointers. Is this necessary, or is x=MyArray[4][6]
   just as good? 
   Since this code _is_ a port, I have noticed some data conversion too. An
   int is no longer 16 bits long. Since I don't really need a 32 bit int in
   the core code would it help at all for me to change things to "short
   unsigned int"s?
   I may end up doing the heavily used functions in assembler, but I'd like
   to make sure that the c code is as optimal as it can be first. Any
   comments, suggestions or pointers are welcomed.
The following compiles and runs were performed with GCC on DG Aviion.  I have
found similar results in DJGPP on similar test in the past though I am not at
home now to try it.  Running the following test program compiled various ways
gives results that verify that the GCC optimizer does a better job than you
can:
These compiles leave register allocation to the compiler:
--------------------------------------------------------
gcc -o tst.D tst.c
gcc -o tst.O -O tst.c
gcc -o tst.O2 -O2 tst.c
gcc -o tst.O3 -O3 tst.c
These compiles declare register variables explicitly:
----------------------------------------------------
gcc -o tst.DR tst.c -Dusereg
gcc -o tst.OR -O tst.c -Dusereg
gcc -o tst.O2R -O2 tst.c -Dusereg
gcc -o tst.O3R -O3 tst.c -Dusereg
Versions ranked by Array Indexing results worst to best:
		Opt.   Manual
Executable	Level  Regist.	Array Indexing		Pointer Deref.
-------------	-----  ------	-------------------	------------------
tst.D		None   No		5.752117		4.543308
tst.DR		None   Yes		1.621305		1.192638
tst.O		 -O    No		0.801081		0.844428
tst.OR		 -O    Yes		0.658607		0.869502
tst.O2R		 -O2   Yes		0.523563		0.642418
tst.O3R		 -O3   Yes		0.549404		0.718574
tst.O2		 -O2   No		0.499495		0.676681
tst.O3		 -O3   No		0.484081		0.709432
Clearly -O3 optimizations using array indexing and optimizer determined
register allocation is the fastest scheme.  This shows several points:
  1) Using array indexing is superior to using pointers when any level of
optimization is used.
  2) Manual register allocation is only helpful if no optimization or -O
optimization is used!
  3) Manual register allocation is most helpful with variables used in the
array indexing than pointers.
#############################  tst.c  ##################################
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <limits.h>
#include <sys/time.h>
#if defined(usereg) 
#define REG register
#else
#define REG
#endif
#if !defined(ITERS)
#define ITERS 100000
#endif
void print_diff( char *str, struct timeval st, struct timeval en );
main( int argc, char **argv)
{
    char arry[9][9];
    REG long i, j, k;
    REG char *p;
    struct timeval start, end;
    gettimeofday( &start, NULL );
    for( i=0; i<ITERS; i++) {
	for (j=0;j<9;j++)
	    for (k=0;k<9;k++) {
		arry[j][k] = j+k;
	    }
    }
    gettimeofday( &end, NULL );
    print_diff( "ARRAY", start, end );
    gettimeofday( &start, NULL );
    for( i=0; i<ITERS; i++) {
	p = &arry[0][0];
	for (j=0;j<9;j++)
	    for (k=0;k<9;k++) {
		*p++ = j+k;
	    }
    }
    gettimeofday( &end, NULL );
    print_diff( "POINTER", start, end );
}
void print_diff( char *str, struct timeval st, struct timeval en )
{
    struct timeval df;
    df.tv_sec = en.tv_sec - st.tv_sec;
    df.tv_usec = en.tv_usec - st.tv_usec;
    if (df.tv_usec < 0) {
	df.tv_sec--;
	df.tv_usec += 1000000;
    }
    printf( "%s: diff = %d.%06.6d.\n", str, df.tv_sec, df.tv_usec );
}
-- 
Art S. Kagel, kagel AT quasar DOT bloomberg DOT com
A proverb is no proverb to you 'till life has illustrated it.  -- John Keats
- Raw text -