From: "Alexei A. Frounze" Newsgroups: comp.os.msdos.djgpp Subject: Re: inefficiency of GCC output code & -O problem Date: Tue, 18 Apr 2000 00:47:20 +0400 Organization: MTU-Intel ISP Lines: 214 Message-ID: <38FB7858.41B090DB@mtu-net.ru> References: <38F9D717 DOT 9438A3F6 AT mtu-net DOT ru> <8df84a DOT 3vvqu6v DOT 0 AT buerssner-17104 DOT user DOT cis DOT dfn DOT de> <38FB4094 DOT DE7B5F4C AT mtu-net DOT ru> <8dfum2 DOT 3vvqu6v DOT 0 AT buerssner-17104 DOT user DOT cis DOT dfn DOT de> NNTP-Posting-Host: ppp97-207.dialup.mtu-net.ru Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: gavrilo.mtu.ru 956007958 75509 212.188.97.207 (17 Apr 2000 21:45:58 GMT) X-Complaints-To: usenet-abuse AT mtu DOT ru NNTP-Posting-Date: 17 Apr 2000 21:45:58 GMT Cc: buers AT gmx DOT de, eliz AT is DOT elta DOT co DOT il X-Mailer: Mozilla 4.72 [en] (Win95; I) X-Accept-Language: en,ru To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Dieter Buerssner wrote: > I refered to the T_Map() function, you posted to this group. This > can clearly be (quite efficiently) written in C. I didn't look at > span(). Then you didn't have to say "This is not true.": ------------------8<---------------------- ... >Not really. The inner loop in my tmapper can not be written in pure C. >Belive me. This is not true. ... ------------------8<---------------------- ??? > >> I get rid of all your inline assembly in T_Map. I will be allowed > >> to add one single line (say less than 50 characters from __asm__ > >> upto the closing ')' ) of inline assembly to your source. I bet, > >> the plain C code will perform about the same, as your inline > >> code. I win, when my code is no more than 2 FPS slower, or faster, than > >> your code (The executable you sent reports 70 FPS here). > > > >How many are there such lines in your oppinion? :) > > I don't understand this question. I thought you could find something over the simple (int)(x) replacement and so I asked if there are many such lines. :) > To elaborate, and make this on-topic again. Some of the code Alexei > posted uses just "normal" floating point math. He coded almost all > of this inline. I replaced this by the equivalent C-Code, that > mostly was already there. Some minor modifications where something > like > /* a=d/c; b=e/c; */ /* This was already there in comments */ > replaced by > > #if USEC > f=1.0/c; a=d*f; b=e*f; > #else > __asm__ /* ... */ > #endif Btw, don't forget that this is so only in one place while other similar things are written in C this way. So, it's not a serious thing. :) > The same optimizatition, Alexei has made in his inline assembler. Yup. > After this I recompiled, and the speed went up from 70 FPS to 72 FPS. Oh man, 2/70 = 2.9% :)) > This, I think, proves, that gcc is capable to produce quite efficient > floating point code. Surely it proves. But this is a bit strange, though. :) > Of course, Alexei's code would have won, if he > had replaced > > __asm__ volatile("fldl (%0)\n ...\nfstpl (%0)" : : "r" (&dbl)); Well, I didn't know that (int)(x) is slower. Btw, I need to take a look at the .S file. I've not seen how this "round" is made yet. > (Alexei, you got rid of the "g", but I think, here "memory" > is needed in the clobber-list. I'm not totally certain, though.) > > with > > __asm__ volatile("fldl %0\n ...\nfstpl %0" : "=m" (dbl) : "0" (dbl)); > > This would give gcc more chances to optimize. It uses > less registers, and also needs less instructions. I have not tried > this, but even then I think, the C code would not produce much > less efficient code, than the inline assembly. I cleaned up all my source today before your post. :) There are "memory" words everywhere now. And some other stuff is also improved. > Where gcc produces considerably less efficient code, is when you have > > int i; > double a, b; > > i = (int)(a*b); > > Here, gcc always needs to save and restore the FPU control word, and > there are a few occurences of this type in Alexei's code. (I don't > blame gcc here, I think it is almost impossible to do much better > for a compiler.) Stupid thing. It doesn't have to save/load the state of FPU. I think it's needed only for such things as ceil() and floor(). (int)(x) should be w/o save/restore. > > I replaced the above code with > > /* can be #ifdefed and replaced by > > #define to_int(x) ((int)(x)) > > for non gcc and i386, to make it even portable. Comments for other > or more efficient methods to do double -> int conversions are wellcome. */ Joking? :)) > __inline__ static int to_int(double x) > { > int r; > __asm__ volatile ("fistl %0" : "=m" (r) : "t" (x)); /* "t" is for st0 */ > return r; > } > > ... > #if USEC2 > i = to_int(a*b); > #else > __asm__ /* ... */ > #endif > > This is essentially, what the inline code of Alexei does. (I have > not bothered to look up, whether the fistl instruction rounds, > or chops, so this may be not the same as the C-code.) Sure, FIST(P)L. :) > While the to_int function is not optimal (gcc will have to code > one superflous fstp instruction, compared to fistpl), it is still > quite a bit more efficient than C code. With these modifications, > I got rid of all the other inline assembly. I got 70 FPS, the > same as the original (either the self compiled sources, or the > executable Alexei sent to me). > > Alexei's code will "cache" some values on the FPU stack, which > gcc is not able to see (with the switches I used). Nevertheless, > even here, with the help of only one line of inline assembly, > it produces comparable results. Again, it would loose, when all > those references and adress-off operations would be omitted. > It should be clear, that the compiler won't reach the efficiency > of hand optimzed assembler code. Whether the relative small > difference here is worth all the trouble, ... Don't forget that my code didn't compile with either -O or -O2 then. It makes difference. Note this. > One last comment, on the T_Map function. The C-code version actually > got quite a bit slower (5 FPS, IIRC), when compiled with -O2 or -O3, > compared to -O only. The assembler version, not surprisingly, was > not effected. > > There was one bug in the other part of the sources, that may be of > general interest. > > [All the context omitted (Alexei, it's in your linev)] > int c; /* only low byte used */ > __asm__ volatile("movb %0, al" : : "g" (c)); > > This actually compiled with -O2, but got an error with -O by gas. > It should be clear why - when gcc decides that c will live in memory > or in a/b/c/dx, it will work, when it is in (say) esi, it won't. > So, this is a nice example, why "but it work's", doesn't buy you > too much. I cleaned up this in the morning. > Alexei, I have made some fun. I hope I have made up for it, by this > post, that took actually longer to write, than the coding. > I will send you the modified source by email. The post hopefully > was of general interest. Well, let me tell some words in conclusion. ;) 1. You simply proved that GCC has an optimizer efficient enough. Okay, I agree. Your code that works 2 FPS fater for you works the same for me as before. I think it doesn't mean faster than mine (just 2.9%). So, we have a good optimizer and you proved this. Great. I'm glad. This means I can throw away a lot of inline ASM now. 2. If I knew that (int)(x) is slow and if I had proper manual on inline ASM, I would achived the same but with less problems. 3. Dieter, I hope you won't try to convert span() to plane C. :) This replacement doesn't work even nearly fast: --------------8<---------------- while (n--) { *scr++ = *(texture+((v1>>8)&0xFF00)+((u1>>16)&0xFF)); u1 += du; v1 += dv; }; --------------8<---------------- Anyway thank you. And btw, thank to myself. If I didn't write efficient C code between /* */ :), Dieter would never prove that GCC has a good optimizer because he doesn't know the tmapping algorithm (do you?). Seems that this is a story that can teach everyone (me=best example). :)) I think this thread is almost closed. Just one short question is left (I mean span() :). thanks. Alexei A. Frounze ----------------------------------------- Homepage: http://alexfru.chat.ru Mirror: http://members.xoom.com/alexfru