From: buers AT gmx DOT de (Dieter Buerssner) Newsgroups: comp.os.msdos.djgpp Subject: Re: inefficiency of GCC output code & -O problem Date: 13 Apr 2000 21:49:32 GMT Lines: 77 Message-ID: <8d5ljq.3vvqipv.0@buerssner-17104.user.cis.dfn.de> References: <38F20E7A DOT 3330E9A4 AT mtu-net DOT ru> <38F23A21 DOT A59621A1 AT inti DOT gov DOT ar> <38F49A45 DOT 13F0AB1 AT mtu-net DOT ru> <8d4ca1 DOT 3vvqqup DOT 0 AT buerssner-17104 DOT user DOT cis DOT dfn DOT de> <38F60DB3 DOT E355975 AT mtu-net DOT ru> NNTP-Posting-Host: pec-115-232.tnt7.s2.uunet.de (149.225.115.232) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: fu-berlin.de 955662572 7566287 149.225.115.232 (16 [17104]) X-Posting-Agent: Hamster/1.3.13.0 User-Agent: Xnews/03.02.04 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Alexei A. Frounze wrote: >Dieter Buerssner wrote: >> It would be interesting to know, what the performance difference >> of this code and the code without the inline assembly was. > >Well, I don't think it's possible to write extremely fast 3d renderer >w/o ASM at all (at least on i386+ CPUs). Don't you think Wolf3d, Doom, >Descent and Quake would not be as fast as they are, if they were written >in pure C (even Watcom C, which was one of the best compilers then)? I have not said, that the plain C code would be faster or slower. I just asked a question, that may be not to difficult to answer for you, because the C code is already there, in comments. >> But, why use this? Gcc will most probably produce exactly the >> same code by >> >> du >>= SUB_BITS; >> dv >>= SUB_BITS; > >It will load EAX, shift EAX and put the result back instead of plane >shift using a memory reference. At least I saw this in disassembly. Not here. When compiling your code with gcc -O -S (gcc 2.95.2), for the interesting lines du = u2 - u1; dv = v2 - v1; // sar du, SUB_BITS // sar dv, SUB_BITS __asm__ __volatile__ (" sarl %2, (%0) sarl %2, (%1)" : : "g" (&du), "g" (&dv), "g" (SUB_BITS) ); I get the following assembler output: [All other parts snipped] movl -132(%ebp),%ebx subl %edi,%ebx movl %ebx,-184(%ebp) movl -136(%ebp),%ebx subl %esi,%ebx leal -96(%ebp),%edx leal -100(%ebp),%eax /APP sarl $4, (%edx) sarl $4, (%eax) /NO_APP So, this happens to produce correct code, even if the inline assembly is wrong, as I explained in another post. If I use the C code, that is cited above, the output is the following: movl -132(%ebp),%ebx subl %edi,%ebx movl -136(%ebp),%edx subl %esi,%edx sarl $4,%ebx sarl $4,%edx So, which do you think is more efficient? Even when the to shifted values are not in registers, gcc will usually produce code like sarl $4, -4(%ebp) -- Regards, Dieter