From: buers AT gmx DOT de (Dieter Buerssner) Newsgroups: comp.os.msdos.djgpp Subject: Re: inefficiency of GCC output code & -O problem Date: 13 Apr 2000 09:51:53 GMT Lines: 85 Message-ID: <8d4ca1.3vvqqup.0@buerssner-17104.user.cis.dfn.de> References: <38F20E7A DOT 3330E9A4 AT mtu-net DOT ru> <38F23A21 DOT A59621A1 AT inti DOT gov DOT ar> <38F49A45 DOT 13F0AB1 AT mtu-net DOT ru> NNTP-Posting-Host: pec-106-34.tnt6.s2.uunet.de (149.225.106.34) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: fu-berlin.de 955619513 7511450 149.225.106.34 (16 [17104]) X-Posting-Agent: Hamster/1.3.13.0 User-Agent: Xnews/03.02.04 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Alexei A. Frounze wrote: >Well, it still isn't compiled with the -O2 switch, although it's >okay w/o it. I will comment a few places only. But I see similar things almost everywhere. > double X, DX; [...] > short SW, LW = 0x1B3F; [...] > __asm__ __volatile__ (" > fstcw (%0) > fldcw (%1) > fldl (%2) > " > : > : "g" (&SW), "g" (&LW), "g" (&X) > ); With the "g" constraint, your input can be a register, Then it would work. It can also be a complicated as displacement(reg1,reg2,factor) and then it won't work. What ever it will be, may depend on compiler switches. Without testing, I think that the "r" constraint would work here. But this approach has the disadvantage of needing more registers, and you may end up with slower code, than without the inline assembly at all. (If I understand your well commented ;) source correctly, the whole point of the inline assembler is to avoid multiple fstcw, fldcw, fstcw code, that would be generated by ceil). Also, there should be a "memory" in the clobber list (see gcc manual). Ideally, you would want to write the code like this __asm__ __volatile__ (" fstcw %0 fldcw %1 fldl %2 " : : "m" (SW), "m" (LW), "m" (X) ); But here I have seen errors like "cannot meet constraint ... " The only solution I found for this, is to declare the variables as volatile. Perhaps other people can comment, whether this is guaranteed to work. >void T_Map (char *texture) { It would be interesting to know, what the performance difference of this code and the code without the inline assembly was. Because here you don't change the FPU control word, it seems to me, that gcc -O should be able to produce code, that is efficient. > __asm__ __volatile__ (" > sarl %2, (%0) > sarl %2, (%1)" > : > : "g" (&du), "g" (&dv), "g" (SUB_BITS) > ); This should be __asm__ __volatile__ (" sarl %4, %0 sarl %4, %1" : "=g" (du), "=g" (dv) : "0" (du), "1" (dv), "i" (SUB_BITS) ); But, why use this? Gcc will most probably produce exactly the same code by du >>= SUB_BITS; dv >>= SUB_BITS; -- Regards, Dieter