From: buers AT gmx DOT de (Dieter Buerssner)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: inefficiency of GCC output code & -O problem
Date: 14 Apr 2000 14:47:50 GMT
Lines: 62
Message-ID: <8d7i78.3vvqipv.0@buerssner-17104.user.cis.dfn.de>
References: <38F20E7A DOT 3330E9A4 AT mtu-net DOT ru> <38F23A21 DOT A59621A1 AT inti DOT gov DOT ar> <38F49A45 DOT 13F0AB1 AT mtu-net DOT ru> <8d4ca1 DOT 3vvqqup DOT 0 AT buerssner-17104 DOT user DOT cis DOT dfn DOT de> <38F60DB3 DOT E355975 AT mtu-net DOT ru> <8d5ljq DOT 3vvqipv DOT 0 AT buerssner-17104 DOT user DOT cis DOT dfn DOT de> <38F6A29B DOT 3AAEC0E AT mtu-net DOT ru>
NNTP-Posting-Host: pec-107-11.tnt6.s2.uunet.de (149.225.107.11)
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: fu-berlin.de 955723670 7804338 149.225.107.11 (16 [17104])
X-Posting-Agent: Hamster/1.3.13.0
User-Agent: Xnews/03.02.04
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Alexei A. Frounze wrote:

>Dieter Buerssner wrote:
>> 
>> Alexei A. Frounze wrote:
>> 
>> I have not said, that the plain C code would be faster or slower.
>> I just asked a question, that may be not to difficult to answer
>> for you, because the C code is already there, in comments.

Note, that I was refering to the T_Map function here, that does
not need to fiddle with the FPU control word.

>Not really. Actually, my ASM code improves the performance greatly. And

Can you quantify this. 

>I can't copmare ASM vs optimized plain C because GCC/AS doesn't compile
>the source with -O2 switch.

Why not? With all the usages of the adress off operater "&", gcc
won't be able to optimize your code very much. If you use &var, usually
var cannot be kept in a register. I have shown you one example,
where this can be clearly seen. Also, the inner loops seem to be
almost totally written in inline assembler, so you can't expect
much optimization anyway.

To "fix" your code temporarily, you could try to replace all
"g", with "r", and add "memory" to the clobber list. But of course, 
you shouldn't do this blindly. Then you may be able to compile it 
with -O2. Post the results.

>> Not here. When compiling your code with gcc -O -S (gcc 2.95.2),
>> for the interesting lines
>> So, this happens to produce correct code, even if the inline
>> assembly is wrong, as I explained in another post.

After looking at my own post, I am not sure, whether this is
correct. There are two occurences of the code snippet in the
source. I looked at the assembler output of one, that seemed correct. 
I pasted the other, and that seems incorrect. Anyway, the C code produces
better assembly, as your inline code with the references could ever do.

>But -O is not enough for me. ;) I still need -O2.

Then try it yourself with -O2, it does take less than a minute,
and you will see no significant difference.

As you have noted, gcc may have optimization bugs. (It for sure
has, I have found at least one in the current verion.) But
throwing buggy code at gcc and then complaining, is hardly
a mean to prove this.

One final suggestion: Stay away from inline assembly (yet).
I have seen some snippets you have posted (and mailed),
and almost all of them were clearly wrong. 

To cite Donald E. Knuth from memory:
  "Premature optimization is the devil of all programming."

-- 
Regards, Dieter