From: buers AT gmx DOT de (Dieter Buerssner)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: inefficiency of GCC output code & -O problem
Date: 13 Apr 2000 21:49:32 GMT
Lines: 77
Message-ID: <8d5ljq.3vvqipv.0@buerssner-17104.user.cis.dfn.de>
References: <38F20E7A DOT 3330E9A4 AT mtu-net DOT ru> <38F23A21 DOT A59621A1 AT inti DOT gov DOT ar> <38F49A45 DOT 13F0AB1 AT mtu-net DOT ru> <8d4ca1 DOT 3vvqqup DOT 0 AT buerssner-17104 DOT user DOT cis DOT dfn DOT de> <38F60DB3 DOT E355975 AT mtu-net DOT ru>
NNTP-Posting-Host: pec-115-232.tnt7.s2.uunet.de (149.225.115.232)
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: fu-berlin.de 955662572 7566287 149.225.115.232 (16 [17104])
X-Posting-Agent: Hamster/1.3.13.0
User-Agent: Xnews/03.02.04
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Alexei A. Frounze wrote:

>Dieter Buerssner wrote:
>> It would be interesting to know, what the performance difference
>> of this code and the code without the inline assembly was.
>
>Well, I don't think it's possible to write extremely fast 3d renderer
>w/o ASM at all (at least on i386+ CPUs). Don't you think Wolf3d, Doom,
>Descent and Quake would not be as fast as they are, if they were written
>in pure C (even Watcom C, which was one of the best compilers then)?

I have not said, that the plain C code would be faster or slower.
I just asked a question, that may be not to difficult to answer
for you, because the C code is already there, in comments.

>> But, why use this? Gcc will most probably produce exactly the
>> same code by
>> 
>>   du >>= SUB_BITS;
>>   dv >>= SUB_BITS;
>
>It will load EAX, shift EAX and put the result back instead of plane
>shift using a memory reference. At least I saw this in disassembly.

Not here. When compiling your code with gcc -O -S (gcc 2.95.2),
for the interesting lines

      du = u2 - u1;
      dv = v2 - v1;
//    sar du, SUB_BITS
//    sar dv, SUB_BITS
      __asm__ __volatile__ ("
        sarl %2, (%0)
        sarl %2, (%1)"
        :
        : "g" (&du), "g" (&dv), "g" (SUB_BITS)
      );

I get the following assembler output:

[All other parts snipped]

        movl -132(%ebp),%ebx
        subl %edi,%ebx
        movl %ebx,-184(%ebp)
        movl -136(%ebp),%ebx
        subl %esi,%ebx
        leal -96(%ebp),%edx
        leal -100(%ebp),%eax
/APP

        sarl $4, (%edx)
        sarl $4, (%eax)
/NO_APP

So, this happens to produce correct code, even if the inline
assembly is wrong, as I explained in another post.

If I use the C code, that is cited above, the output is
the following:
  
        movl -132(%ebp),%ebx
        subl %edi,%ebx
        movl -136(%ebp),%edx
        subl %esi,%edx
        sarl $4,%ebx
        sarl $4,%edx

So, which do you think is more efficient?

Even when the to shifted values are not in registers, gcc will
usually produce code like
  
       sarl $4, -4(%ebp)

-- 
Regards, Dieter