Date: Wed, 26 Feb 1997 11:30:06 -0600 (CST) From: "Jesse W. Bennett" Reply-To: Jesse Bennett To: Eli Zaretskii cc: Jesse Bennett , djgpp AT delorie DOT com Subject: Re: Netlib code [was Re: flops...] In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII On Sun, 23 Feb 1997, Eli Zaretskii wrote: > On 20 Feb 1997, Jesse Bennett wrote: [snip] > > I have done some benchmarking of the above matrix multiply code on a > > DEC Alpha using the native DEC compilers. The "pointer based" C > > implementation was fastest, followed by the F77 code and the "array > > based" C code. Hence my comments. > > Try it with gcc. In most cases, it converts the array-based code to > pointer-based automatically, as far as I could see, and in many cases it > does that better than you would. Hi Eli, I tried this on a Linux box with gcc 2.6.3 and 2.7.2 and the results were encouraging, but the pointer based code was still slightly faster. When I looked at the generated assembly I could see that the array based implementation was making better use of the x86 CISC instruction set but the innermost instruction loop appears to have some unnecessary memory references (I say "appears" because I am not very familiar with the x86). The test code was: void gemm( int m, int n, int k, double **a, double **b, double **c ) { /* C = AB + C */ int i, j, l; double temp; for( i=0; i