www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/04/18/03:34:44

From: buers AT gmx DOT de (Dieter Buerssner)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: inefficiency of GCC output code & -O problem
Date: 18 Apr 2000 06:31:07 GMT
Lines: 78
Message-ID: <8dh6kr.3vvqvqr.0@buerssner-17104.user.cis.dfn.de>
References: <Pine DOT LNX DOT 4 DOT 10 DOT 10004161837540 DOT 1138-100000 AT darkstar DOT grendel DOT net> <38F9D717 DOT 9438A3F6 AT mtu-net DOT ru> <8df84a DOT 3vvqu6v DOT 0 AT buerssner-17104 DOT user DOT cis DOT dfn DOT de> <38FB4094 DOT DE7B5F4C AT mtu-net DOT ru> <8dfum2 DOT 3vvqu6v DOT 0 AT buerssner-17104 DOT user DOT cis DOT dfn DOT de> <38FB7858 DOT 41B090DB AT mtu-net DOT ru>
NNTP-Posting-Host: pec-104-133.tnt5.s2.uunet.de (149.225.104.133)
Mime-Version: 1.0
X-Trace: fu-berlin.de 956039467 8119095 149.225.104.133 (16 [17104])
X-Posting-Agent: Hamster/1.3.13.0
User-Agent: Xnews/03.02.04
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Alexei A. Frounze wrote:

>3. Dieter, I hope you won't try to convert span() to plane C. :)
                                                      ^^^^^
(Nice misspelling. With optimizing plane C-compiler, you shouldn't
need any assembly for 3d graphics ;)

Sorry, I must dissapoint you.

>This replacement doesn't work even nearly fast:

>      while (n--) {
>        *scr++ = *(texture+((v1>>8)&0xFF00)+((u1>>16)&0xFF));
>        u1 += du;
>        v1 += dv;
>      };
        ^ 
Why this semicolon? The same thing I see everywhere in your sources.

Assuming n >= 0, and taking the liberty of slightly changing
your interface (the pointers are not needed), I got after a few
minutes:

/* Add this to the top of T_Map() */
static void 
span2(char *scr, char *texture, int n, int u1, int v1, int du, int dv)
{
  switch (n&3)
  {
    case 3:
      *scr++ = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)];
      u1 += du;
      v1 += dv;
    case 2:
      *scr++ = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)];
      u1 += du;
      v1 += dv;
    case 1:
      *scr++ = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)];
      u1 += du;
      v1 += dv;
  }
  if ((n >>= 2) != 0)
  {
    do
    {
      scr[0] = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)];
      u1 += du;
      v1 += dv;
      scr[1] = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)];
      u1 += du;
      v1 += dv;
      scr[2] = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)];
      u1 += du;
      v1 += dv;
      scr[3] = texture[((v1>>8)&0xFF00)+((u1>>16)&0xFF)];
      u1 += du;
      v1 += dv;
      scr += 4;
    }
    while (--n != 0);
  }
}

I replaced

      span (scr, texture, n, &u1, &v1, du, dv);

by
      
      span2(scr, texture, n, u1, v1, du, dv);

in T_Map(). Speed went up by 2 FPS ;)

I must admit, that this is really surprising. A fast look at
your assembly implementation has shown: I don't understand it.
And I actually feel no desire at all to understand it.
But it certainly looks fast. So, your results may differ.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019