www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/04/13/07:44:21

From: buers AT gmx DOT de (Dieter Buerssner)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: inefficiency of GCC output code & -O problem
Date: 13 Apr 2000 09:51:53 GMT
Lines: 85
Message-ID: <8d4ca1.3vvqqup.0@buerssner-17104.user.cis.dfn.de>
References: <38F20E7A DOT 3330E9A4 AT mtu-net DOT ru> <38F23A21 DOT A59621A1 AT inti DOT gov DOT ar> <38F49A45 DOT 13F0AB1 AT mtu-net DOT ru>
NNTP-Posting-Host: pec-106-34.tnt6.s2.uunet.de (149.225.106.34)
Mime-Version: 1.0
X-Trace: fu-berlin.de 955619513 7511450 149.225.106.34 (16 [17104])
X-Posting-Agent: Hamster/1.3.13.0
User-Agent: Xnews/03.02.04
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Alexei A. Frounze wrote:

>Well, it still isn't compiled with the -O2 switch, although it's 
>okay w/o it.

I will comment a few places only. But I see similar things
almost everywhere.

>  double X, DX;

[...]
>  short  SW, LW = 0x1B3F;

[...]

>  __asm__ __volatile__ ("
>    fstcw   (%0)
>    fldcw   (%1)
>    fldl    (%2)
>  "
>  :
>  : "g" (&SW), "g" (&LW), "g" (&X)
>  );

With the "g" constraint, your input can be a register, Then it would
work. It can also be a complicated as displacement(reg1,reg2,factor)
and then it won't work. What ever it will be, may depend on compiler
switches. Without testing, I think that the "r" constraint would
work here. But this approach has the disadvantage of needing more
registers, and you may end up with slower code, than without
the inline assembly at all. (If I understand your well commented ;)
source correctly, the whole point of the inline assembler is to
avoid multiple fstcw, fldcw, fstcw code, that would be generated
by ceil). 

Also, there should be a "memory" in the clobber list (see gcc manual).

Ideally, you would want to write the code like this

__asm__ __volatile__ ("
   fstcw   %0
   fldcw   %1
   fldl    %2
 "
 :
 : "m" (SW), "m" (LW), "m" (X)
);

But here I have seen errors like "cannot meet constraint ... "

The only solution I found for this, is to declare the variables
as volatile. Perhaps other people can comment, whether this is
guaranteed to work.

>void T_Map (char *texture) {

It would be interesting to know, what the performance difference
of this code and the code without the inline assembly was.
Because here you don't change the FPU control word, it seems to
me, that gcc -O should be able to produce code, that is efficient.

>    __asm__ __volatile__ ("
>      sarl %2, (%0)
>      sarl %2, (%1)"
>      :
>      : "g" (&du), "g" (&dv), "g" (SUB_BITS)
>    );

This should be

    __asm__ __volatile__ ("
      sarl %4, %0
      sarl %4, %1"
      : "=g" (du), "=g" (dv)
      : "0" (du), "1" (dv), "i" (SUB_BITS)
    );

But, why use this? Gcc will most probably produce exactly the
same code by

  du >>= SUB_BITS;
  dv >>= SUB_BITS;

-- 
Regards, Dieter

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019