Message-ID: <336DCFDB.7C54@silesia.top.pl> Date: Mon, 05 May 1997 14:17:31 +0200 From: Michal MIME-Version: 1.0 To: djgpp AT delorie DOT com Subject: Re: Alignment Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Precedence: bulk Leath Muller wrote: > > No - your wrong... :) The fdiv, sqrt, fmul, fadd and fsub are all affected > by moving the FPU into single precision mode... > You're saying that if I have my FPU in double precision mode and execute for example -fmul %st(1)- the FPU is swiched into single precision and after executing fmul back to double precision? If it is right, why do we have diferent precisions, when all operations are realy single? > I also get the impression then that your texturing 8 pixels, lighting 8 > pixels, texturing 8 pixels, lighting... etc ... Basically, this is _really_ > bad for cache coherency - your better off texturing the complete scanline > and then lighting the complete scanline. No I'm doing it at the same time. > I moved to this way with using a > temporary offscreen memory buffer of 2560 bytes (I do stuff in true colour). > Write the texture stuff to the offscreen memory (which in my inner loop > never left the 8k cache area per line), and then do your lighting from there... > I don't think that it would be faster. You would need a buffer to store 1/z for every 8 pixels, unless you're dividing it once more. And some pixels of scaneline could be out of cashe when they would be written for the secund time. The only good side that I see is more registers for both texturing and lightning, but you need more instructions; writting to 1/z's buffer, secund time address calulation, secund loop and stuff like that. > If your wondering, I had my perspective correct, sub-pixel accurate true > colour light-sourced, gouraud shaded engine running at 16 cycles per pixel. My is drawing about 7.8 milions pixels per secund writing to LFB (ViRGE) on my P120 in 8bit color. That's 15.5 cycles per pixel, but it's with cashe misses. I've never calculated it so accurately, but I think I would be something about 12-13 clocks per pixel. Your result is quite good, I mean clocks/pixel. In 24bit color your inner must be dramaticly slowing down becouse of cashe misses, you have 3 bytes per pixel textures and 3 times more memory to address. I think it would be better to do it in 16bit color, use 1 byte per pixel textures, organize your pallete in that way, that high byte of all colors (in palete) would by brightness and low byte the real color value(teaken from texture). Adapted to thet my inner in theory would have the same speed, but it would have more cash misses. I've never coded for 16 bit color, never even try. > With MMX registers, I could get it running in 9 cycles per pixel... which > is faster than Quake and looks a whole lot better... I don't have MMX, so I can't say, but I don't think it can give such a speed up. You cann't use MMX and FPU at the same time, so You would have to write non-FPU inner. The whole think about FPU code overlaping with CPU code would be lost. Also MMX have no div instruction (as far as I know) so You would have to use CPU div. Maybe it can be done with saveing FPU registers in some buffer, and then loading MMX regisrers or somethink like thet. Inner speed is not everything, try to create whole engin like in QUAKE, that's the real difficult task. PS What about my first question 'Haw to align in DJGPP'. My doubles are NOT aligned at 8 byte boundary like they should be. Sorry my anser is so late, but I had problems with my internet provider.