www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/04/10/17:21:55.1

From: "Alexei A. Frounze" <alex DOT fru AT mtu-net DOT ru>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: inefficiency of GCC output code & -O problem
Date: Mon, 10 Apr 2000 23:37:10 +0400
Organization: MTU-Intel ISP
Lines: 61
Message-ID: <38F22D66.42D78282@mtu-net.ru>
References: <38F20E7A DOT 3330E9A4 AT mtu-net DOT ru> <38F2250B DOT 1DC270D5 AT maths DOT unine DOT ch>
NNTP-Posting-Host: ppp102-126.dialup.mtu-net.ru
Mime-Version: 1.0
X-Trace: gavrilo.mtu.ru 955395489 73052 212.188.102.126 (10 Apr 2000 19:38:09 GMT)
X-Complaints-To: usenet-abuse AT mtu DOT ru
NNTP-Posting-Date: 10 Apr 2000 19:38:09 GMT
X-Mailer: Mozilla 4.61 [en] (Win95; I)
X-Accept-Language: en,ru
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Gautier wrote:
> 
> "Alexei A. Frounze":
> 
> > Why GCC output too much redundant code?
> 
> With which optimisation level do you mean that ?

I thought it should be w/o specifying some optimization level... See below.

> 
> > I mean, it always put values to the CPU registers, although it's possible to
> > make the same operation w/o taking registers?
> 
> If you observed that with -O2, it is an optimisation strategy:
> when the most data are loaded in registers, the program has more chances to
> run at a speed that resembles the processor speed than at main board speed (roughly said...).
> It means to have less data to communicate with RAM, which takes big time
> resources. 

Well, but it work with registers even when value needs to be zeroed, shifted and
so on. It loads registers in very simple situations, where possible to adjust
some bytes and we're done.

> On Intel x86s there is not much to do - there are so few registers - but
> anyway GCC is very smart at register mapping !

Not a few. That's enough. Btw, the same 3d engine runs around 12% faster if
compiled with Watcom C than compiled with GCC (-O2, no stack frames, etc.etc). 

Btw, Watcom C also can produce (w/o special optimize swtches) an inner loop of
tmapper as fast as assembly subroutine. I tested - there is no any noticeable
difference between pure C and some ASM. :)

I don't think GCC is a very good optimizing compiler. Anyway it has something
over other compilers: it's free, it's portble and multiplatform.

> On processors with more recent design like Alpha or even the Motorola 68K, optimisers
> try to exploit at best the 36 (16) registers and the result is beautiful.
> 
> > Also why GCC does type cast of byte/word <-> dword values so awful? It allocates
> > some extra bytes on the stack, put values there and get them back...
> 
> On 32-bit x86 it is better to use only 8 of 32 bits than allocating and manipulating
> 1 byte... But maybe you just hit a weak typing problem (I mean GCC hesitates about
> the precise types, the sizes and so on) ? 

It always works this way. So I forced my functions to have only 32-bit
parameters and return values. This solves this ugly problem.

> You should try an equivalent test on a very
> strong typed GCC front-end like GNAT...

What's GNAT?

bye.
Alexei A. Frounze
-----------------------------------------
Homepage: http://alexfru.chat.ru
Mirror:   http://members.xoom.com/alexfru

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019