Message-ID: <33731766.92F@silesia.top.pl> Date: Fri, 09 May 1997 14:24:06 +0200 From: Michal MIME-Version: 1.0 To: djgpp AT delorie DOT com Subject: Re: Alignment References: <199705080655 DOT QAA17483 AT solwarra DOT gbrmpa DOT gov DOT au> Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Precedence: bulk Leath Muller wrote: > > *EWWWW* Self modifing code on a PPro is an absolute no-no... your code will > probably die in a major way on a PPro... > I modyfy the code once per triangle. It rules; my affine is 9 clocks (can be done 8, but I would have to lost some features and with a large number of triangles it would have been slower) it uses 64x64 textures, can repeat them in triangle, uses look up table for lightning, texture can have any offset, uses 6:26 fixed point for texture u and v, and 8:24 fixed point for lightning. > > What is your code based on? Heckers? > What does -Heckers- means? > > You lost me a bit there... what are you talking about a constant offset? > I use one table which has lighting information based on the source value. In > true colour, each R, G and B light component can be an 8 bit value. Each > source texel RGB component value is an 8 bit value. Combinine the two you get a > 16bit value which is the result of the lit texel in the right colour... and > its automatically calculated with the segmented registers... in other words, > I only need one 65536 (16 bit) lookup table because I can use the same table > for all components... > You have to look up that table 3 times per pixel. Is your lightning perspectiv correct? I still don't undertand why doing lightning in second loop. Doing secund loop, and writing to screen fot the second time. More registers would have not recompensed it (at least in my case). > It _can_ be more than one clock on slower Pentium class machines (generally > less than 133's) which is what is still the major market share at the moment. > Are you preloading cashe line once per pixel? If not, it doesn't make any sense; when your u delta (y in texture) is larger than 1/8 you're going to skipp lines and have cashe misses anyway.If your v delta(x in texture) is larger than 4 (skipp 3 texels every pixel) you're ending with cashe misses too(dv>4 => dv*8>32>cashe line size). The same with lightning lookup table. This is for 8bit color. > > Question: Are you saying you calculate the delta's every time you render > a scan line? Anyway, I calculate deltas for first 8 pixels every scanline. I need for this 2 fdivs. First to get first u,v & light value, secund to get them after 8 pixels. I thought even of interpolating deltas and first u,v & light value linnearly every 8 scanelines, but it would make a reall mess in my procedure. That is the real problem in my procedure, in a real frame render this makes a dramatic slown down. > if your doing 8 pixel scanlines (complete) just map > the entire line affinely (if I understand what your saying correctly... :) I'm doing it (ofcoure), but I still need deltas for then 8 pixels. > The way I did it, I simply started the first fdiv which took 19 cycles, > did some integer stuff, did the second fdiv (another 19) with more integer > stuff in parallel, and mapped them affinely getting about 6 cycles per pixel. > Can you elaborate more on this? > I need 2 fdives, 6 fmuls, and other stuff even if the scanline is 2 pixels width. Ofcourse I could write a special case code for that, but in (for example) 4 pixels long canelines would look incorrect, and most of my scanelines is larger then 8 pixels. I can't do integer stuff in parallel with first fdiv (second too) becouse I need its calculation to go further.