Date: Thu, 7 Dec 2000 14:08:03 +0200 From: Tuukka Toivonen X-Sender: tutoivon AT paju DOT oulu DOT fi To: pgcc AT delorie DOT com Subject: horrible code Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: pgcc AT delorie DOT com Look at the following routine: #define GETBITS(x,a,b) (((x)>>(a)) & ((1ULL<<((b)-(a)+1ULL)) - 1ULL)) #define Ntt_type unsigned int #define Long_Ntt_type unsigned long long int Ntt_type ntt_mul_1(Ntt_type a, Ntt_type b) { Long_Ntt_type al, bl, rl, lo, hi; Ntt_type r; al = a; bl = b; rl = al * bl; r = (Ntt_type)(rl>>1); return r; } I compiled it with options desribed in www.athlonlinux.org pages: -s -O3 -fomit-frame-pointer -Wall -mpentiumpro -march=pentiumpro -malign-functions=4 -funroll-loops -fexpensive-optimizations -malign-double -fschedule-insns2 -S (-mwide-multiply had no effect). I'm using pgcc-2.95.2 (from --version). The generated code is horrible. The major weird thing is that pgcc generates _three_ multiplying instructions. It should be clear that just one would be necessary since clearly both input values are just 32-bits. Ok, maybe the compiler is just stupid... but try removing the one right shift at the end. Well, surprisingly, now pgcc is actually smart enough to use just one multiplication! The weirdness is that small change at _output_ changes how well pgcc understands the _input_ numbers. Hmm, now that I think it probably doesn't. But the bits 32..63 of input numbers have no effect on output bits 0..31 unless you do the right shift. So that must be the reason. [ For the original non-simplified version of this routine, pgcc generated 68 instructions with 3 multiplications and it is easily possible to do in 16 instructions and one multiplication by hand. So GCC still doesn't replace assembly programmer...] Ps. I don't want to say that pgcc wouldn't be good, I'm just pointing out a weakness in it. Althought the developers probably know already that, i thought this is interesting. Generated asm code follows: ntt_mul_1: subl $28,%esp pushl %ebp pushl %edi pushl %esi pushl %ebx movl 48(%esp),%ecx movl 52(%esp),%eax xorl %edx,%edx movl %eax,24(%esp) movl %edx,28(%esp) mull %ecx movl 28(%esp),%esi imull %ecx,%esi xorl %ebx,%ebx movl %eax,%edi movl 24(%esp),%eax imull %ebx,%eax movl %edx,%ebp addl %esi,%ebp addl %eax,%ebp shrdl $1,%ebp,%edi shrl $1,%ebp movl %edi,%eax popl %ebx popl %esi popl %edi popl %ebp addl $28,%esp ret