From: buers AT gmx DOT de (Dieter Buerssner) Newsgroups: comp.os.msdos.djgpp Subject: Re: inefficiency of GCC output code & -O problem Date: 18 Apr 2000 06:31:07 GMT Lines: 78 Message-ID: <8dh6kr.3vvqvqr.0@buerssner-17104.user.cis.dfn.de> References: <38F9D717 DOT 9438A3F6 AT mtu-net DOT ru> <8df84a DOT 3vvqu6v DOT 0 AT buerssner-17104 DOT user DOT cis DOT dfn DOT de> <38FB4094 DOT DE7B5F4C AT mtu-net DOT ru> <8dfum2 DOT 3vvqu6v DOT 0 AT buerssner-17104 DOT user DOT cis DOT dfn DOT de> <38FB7858 DOT 41B090DB AT mtu-net DOT ru> NNTP-Posting-Host: pec-104-133.tnt5.s2.uunet.de (149.225.104.133) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: fu-berlin.de 956039467 8119095 149.225.104.133 (16 [17104]) X-Posting-Agent: Hamster/1.3.13.0 User-Agent: Xnews/03.02.04 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Alexei A. Frounze wrote: >3. Dieter, I hope you won't try to convert span() to plane C. :) ^^^^^ (Nice misspelling. With optimizing plane C-compiler, you shouldn't need any assembly for 3d graphics ;) Sorry, I must dissapoint you. >This replacement doesn't work even nearly fast: > while (n--) { > *scr++ = *(texture+((v1>>8)&0xFF00)+((u1>>16)&0xFF)); > u1 += du; > v1 += dv; > }; ^ Why this semicolon? The same thing I see everywhere in your sources. Assuming n >= 0, and taking the liberty of slightly changing your interface (the pointers are not needed), I got after a few minutes: /* Add this to the top of T_Map() */ static void span2(char *scr, char *texture, int n, int u1, int v1, int du, int dv) { switch (n&3) { case 3: *scr++ = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)]; u1 += du; v1 += dv; case 2: *scr++ = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)]; u1 += du; v1 += dv; case 1: *scr++ = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)]; u1 += du; v1 += dv; } if ((n >>= 2) != 0) { do { scr[0] = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)]; u1 += du; v1 += dv; scr[1] = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)]; u1 += du; v1 += dv; scr[2] = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)]; u1 += du; v1 += dv; scr[3] = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)]; u1 += du; v1 += dv; scr += 4; } while (--n != 0); } } I replaced span (scr, texture, n, &u1, &v1, du, dv); by span2(scr, texture, n, u1, v1, du, dv); in T_Map(). Speed went up by 2 FPS ;) I must admit, that this is really surprising. A fast look at your assembly implementation has shown: I don't understand it. And I actually feel no desire at all to understand it. But it certainly looks fast. So, your results may differ.