Mail Archives: djgpp/1997/02/28/14:45:00
>>>>> "Jesse" == Jesse W Bennett <jesse AT lenny DOT dseg DOT ti DOT com> writes:
Jesse> void gemm( int m, int n, int k, double **a, double **b, double **c )
Jesse> {
Jesse> /* C = AB + C */
Jesse> int i, j, l;
Jesse> double temp;
Jesse> for( i=0; i<m; i++ )
Jesse> for( l=0; l<k; l++ )
Jesse> {
Jesse> temp = a[i][l];
Jesse> for( j=0; j<n; j++ )
Jesse> c[i][j] += temp * b[l][j];
Jesse> }
Jesse> }
Jesse> compiled with gcc -O2 -S gemm.c
Jesse> The generated assembly for the inner loop is:
Jesse> L13:
Jesse> movl (%edi),%edx
Jesse> movl (%esi),%eax
Jesse> fld %st(0)
Jesse> fmull (%eax,%ecx,8)
Jesse> faddl (%edx,%ecx,8)
Jesse> fstpl (%edx,%ecx,8)
Jesse> incl %ecx
Jesse> cmpl %ecx,12(%ebp)
Jesse> jg L13
Jesse> It is not clear to me why the edx and eax registers are being reloaded
Jesse> each iteration.
I can't show DJGPP G77 o/p at present, but assume the generated code
would be the same as this. (On 586 and especially on ppro, the speed
will actually be determined by how your double words happen to get
aligned, sigh.)
$ cat a.f
subroutine gemm(m, n, k, a, b, c)
integer i,m,n,k,l,j
double precision a(n,m), b(n,m), c(n,m)
do i=1,m ! poor for illustration only
do l=1,k
do j=1,n
c(j,i) = c(j,i) + a(l,i)*b(j,l)
end do
end do
end do
end
$ g77 -S -O2 -v a.f
g77 version 0.5.19.1
gcc -S -O2 -v -xf77 a.f
Reading specs from /usr/lib/gcc-lib/i486-unknown-linux/2.7.2.1.f.1/specs
gcc version 2.7.2.1.f.1
/usr/lib/gcc-lib/i486-unknown-linux/2.7.2.1.f.1/f771 a.f -fset-g77-defaults -qu
iet -dumpbase a.f -O2 -version -fversion -o a.s
GNU F77 version 2.7.2.1.f.1 (i386 Linux/ELF) compiled by GNU C version 2.7.2.1.f
.1.
GNU Fortran Front End version 0.5.19.1 compiled: Feb 1 1997 19:51:03
$ more +/L13 a.s
...skipping
addl 24(%ebp),%eax
.align 4
.L13:
movl -24(%ebp),%edi
fldl (%edi)
fmull (%eax)
faddl (%edx)
fstpl (%edx)
addl $8,%eax
addl $8,%edx
decl %ecx
jns .L13
.L8:
- Raw text -