From: nikki AT gameboutique DOT co (nikki)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Allegro perspective-correct .. (fpu memcopy)
Date: 5 Mar 1997 10:34:11 GMT
Organization: GameBoutique Ltd.
Lines: 35
Message-ID: <5fji73$8fo@flex.uunet.pipex.com>
References: <199703050217 DOT MAA15402 AT solwarra DOT gbrmpa DOT gov DOT au>
NNTP-Posting-Host: www.gameboutique.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

> Hmmm...you realise if you extend this code to use all 8 registers, you can
> speed it up even more, performing only 2 cache loads per loop. You can also
> remove the addl's by using indexed addressing to save another cycle each
> loop...

ah, but then it's >32 bytes and won't fit in a cache. the resulting loss is
probably not worth it therefore :( if you have a moment give it a try though
and see if you can come up with any hard and fast values here, my timing
routines suck pretty bad :(

> I haven't played with this at all actually, because I haven't need to fully
> optimise yet. But I will have a look at it tonight and see how I go. Have
> you tried putting the FPU into double precision mode before doing this? If

ah there's a problem there. using 80bit values will take longer to load :(
it's 3 cycles for an 80bit load and 1cycle for a 64bit load. 
how do i change the fpu mode in inline asm like that anyway btw? i haven't
managed to ever get that to work :(

> you do that, the values should be stored as loaded, and no conversion should
> occur. If you are using the FPU in extended precision, it might be causing
> problems with the 64-80-64 bit conversion process. Reducing the precision
> would probably help by causing no conversions to be done...and not run any
> slower because your still moving 8 bytes a time...

i suspect the 6 cycle loading rather than 2 cycle loading now causes considerable
slowdown though :(

regards,
nik


-- 
Graham Tootell           
nikki AT gameboutique DOT com