www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1996/11/27/14:17:19

From: kagel AT quasar DOT bloomberg DOT com
Date: Wed, 27 Nov 1996 13:58:29 -0500
Message-Id: <9611271858.AA11183@quasar.bloomberg.com >
To: gminer AT Newbridge DOT COM
Cc: djgpp AT delorie DOT com
In-Reply-To: <57hg9b$or5@kannews.ca.newbridge.com> (gminer@Newbridge.COM)
Subject: Re: Optimization
Reply-To: kagel AT dg1 DOT bloomberg DOT com

   Errors-To: postmaster AT ns1
   From: gminer AT Newbridge DOT COM (Glen Miner)
   Newsgroups: comp.os.msdos.djgpp
   Date: 27 Nov 1996 13:38:51 GMT
   Organization: Newbridge Networks Corporation
   Lines: 30
   Nntp-Posting-Host: 138.120.136.238
   Dj-Gateway: from newsgroup comp.os.msdos.djgpp
   Content-Type: text
   Content-Length: 1489

   I am currently working on a program that grinds away in a recursive
   algorithm tens of thousands of times over a small group of local data. It
   doesn't do _any_ io, and only calls a select few functions. Due to the
   immense number of calculations required, this is painfully slow. I've
   optimized the algorithm a fair amount, and plan to improve it more, but
   that is only one part of the optimization process. 

   I am currently using the -O3 compile option; is there any other switches I
   should be enabling or disabling? I never need to debug the code, so symbol
   information is not an issue. 

   Can anyone suggest any hard and fast rules for c code optimization under
   djgpp? I mean, right now I have a 9x9 array of bytes, and try to index all
   of the data in it with pointers. Is this necessary, or is x=MyArray[4][6]
   just as good? 

   Since this code _is_ a port, I have noticed some data conversion too. An
   int is no longer 16 bits long. Since I don't really need a 32 bit int in
   the core code would it help at all for me to change things to "short
   unsigned int"s?

   I may end up doing the heavily used functions in assembler, but I'd like
   to make sure that the c code is as optimal as it can be first. Any
   comments, suggestions or pointers are welcomed.

The following compiles and runs were performed with GCC on DG Aviion.  I have
found similar results in DJGPP on similar test in the past though I am not at
home now to try it.  Running the following test program compiled various ways
gives results that verify that the GCC optimizer does a better job than you
can:

These compiles leave register allocation to the compiler:
--------------------------------------------------------
gcc -o tst.D tst.c
gcc -o tst.O -O tst.c
gcc -o tst.O2 -O2 tst.c
gcc -o tst.O3 -O3 tst.c

These compiles declare register variables explicitly:
----------------------------------------------------
gcc -o tst.DR tst.c -Dusereg
gcc -o tst.OR -O tst.c -Dusereg
gcc -o tst.O2R -O2 tst.c -Dusereg
gcc -o tst.O3R -O3 tst.c -Dusereg

Versions ranked by Array Indexing results worst to best:
		Opt.   Manual
Executable	Level  Regist.	Array Indexing		Pointer Deref.
-------------	-----  ------	-------------------	------------------
tst.D		None   No		5.752117		4.543308
tst.DR		None   Yes		1.621305		1.192638
tst.O		 -O    No		0.801081		0.844428
tst.OR		 -O    Yes		0.658607		0.869502
tst.O2R		 -O2   Yes		0.523563		0.642418
tst.O3R		 -O3   Yes		0.549404		0.718574
tst.O2		 -O2   No		0.499495		0.676681
tst.O3		 -O3   No		0.484081		0.709432

Clearly -O3 optimizations using array indexing and optimizer determined
register allocation is the fastest scheme.  This shows several points:

  1) Using array indexing is superior to using pointers when any level of
optimization is used.
  2) Manual register allocation is only helpful if no optimization or -O
optimization is used!
  3) Manual register allocation is most helpful with variables used in the
array indexing than pointers.

#############################  tst.c  ##################################


#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <limits.h>
#include <sys/time.h>

#if defined(usereg) 
#define REG register
#else
#define REG
#endif
#if !defined(ITERS)
#define ITERS 100000
#endif

void print_diff( char *str, struct timeval st, struct timeval en );

main( int argc, char **argv)
{
    char arry[9][9];
    REG long i, j, k;
    REG char *p;
    struct timeval start, end;

    gettimeofday( &start, NULL );
    for( i=0; i<ITERS; i++) {
	for (j=0;j<9;j++)
	    for (k=0;k<9;k++) {
		arry[j][k] = j+k;
	    }
    }
    gettimeofday( &end, NULL );

    print_diff( "ARRAY", start, end );

    gettimeofday( &start, NULL );
    for( i=0; i<ITERS; i++) {
	p = &arry[0][0];
	for (j=0;j<9;j++)
	    for (k=0;k<9;k++) {
		*p++ = j+k;
	    }
    }
    gettimeofday( &end, NULL );

    print_diff( "POINTER", start, end );

}

void print_diff( char *str, struct timeval st, struct timeval en )
{
    struct timeval df;

    df.tv_sec = en.tv_sec - st.tv_sec;
    df.tv_usec = en.tv_usec - st.tv_usec;
    if (df.tv_usec < 0) {
	df.tv_sec--;
	df.tv_usec += 1000000;
    }
    printf( "%s: diff = %d.%06.6d.\n", str, df.tv_sec, df.tv_usec );
}

-- 
Art S. Kagel, kagel AT quasar DOT bloomberg DOT com

A proverb is no proverb to you 'till life has illustrated it.  -- John Keats

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019