From: nikki AT gameboutique DOT co (nikki) Newsgroups: comp.os.msdos.djgpp Subject: Re: Allegro perspective-correct .. (fpu memcopy) Date: 5 Mar 1997 10:34:11 GMT Organization: GameBoutique Ltd. Lines: 35 Message-ID: <5fji73$8fo@flex.uunet.pipex.com> References: <199703050217 DOT MAA15402 AT solwarra DOT gbrmpa DOT gov DOT au> NNTP-Posting-Host: www.gameboutique.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp > Hmmm...you realise if you extend this code to use all 8 registers, you can > speed it up even more, performing only 2 cache loads per loop. You can also > remove the addl's by using indexed addressing to save another cycle each > loop... ah, but then it's >32 bytes and won't fit in a cache. the resulting loss is probably not worth it therefore :( if you have a moment give it a try though and see if you can come up with any hard and fast values here, my timing routines suck pretty bad :( > I haven't played with this at all actually, because I haven't need to fully > optimise yet. But I will have a look at it tonight and see how I go. Have > you tried putting the FPU into double precision mode before doing this? If ah there's a problem there. using 80bit values will take longer to load :( it's 3 cycles for an 80bit load and 1cycle for a 64bit load. how do i change the fpu mode in inline asm like that anyway btw? i haven't managed to ever get that to work :( > you do that, the values should be stored as loaded, and no conversion should > occur. If you are using the FPU in extended precision, it might be causing > problems with the 64-80-64 bit conversion process. Reducing the precision > would probably help by causing no conversions to be done...and not run any > slower because your still moving 8 bytes a time... i suspect the 6 cycle loading rather than 2 cycle loading now causes considerable slowdown though :( regards, nik -- Graham Tootell nikki AT gameboutique DOT com