From: nikki AT gameboutique DOT co (nikki)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: memcpy(); is there something faster?
Date: 26 Feb 1997 09:54:26 GMT
Organization: GameBoutique Ltd.
Lines: 33
Message-ID: <5f118i$iqt@flex.uunet.pipex.com>
References: <59g08k$758_001 AT cpe DOT Maroochydore DOT aone DOT net DOT au>
    <5euboi$296 AT flex DOT uunet DOT pipex DOT com> <5f05mt$s7s$1 AT doffen DOT uninett DOT no>
NNTP-Posting-Host: www.gameboutique.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

> how can they cause an exception? When moving 8 bytes at a time (with
> fild/fistp) you're using integers all the way. If you on the other hand use
> fld/fstp (and can copy up to 10bytes at a time, but loose accuracy and get
> exceptions and additional cache misses) I understand..

well exactly as you said really. using fild fistp is fine and you can move
16 bytes in 16 cycles with .5 normal write cache misses. unfortunately this
is way too slow on anything other than a pentium, either above or below.
now if you could replace those nasty NP instructions (fistp) with a nice
fstp you'd save lots of cycles and the loop would now be very beneficial indeed.
it occurred to me that if you know the address of the stack where fild has
loaded these values, could you not take them off with 2 movl's instead
and artificially pop the stack. ie to do the store part with integer
instructions. this would give a gain of about 4 cycles per 16 bytes i think.
4/3 byte/cycle isn't bad after all. however i suspect rep stosd will be
faster still.

regards,
nik


> 
> 
>: 'safe' way which gives you a transfer rate of 16bytes/16cycles and 1/2 as
>: many cache write misses. if it's 686 or higher you get 1/4 write cache misses.
>: sadly, if it's 686 or higher the rep movsd will go faster ;(
>: basically fpu memcopy is not all it's cracked up to be.
> 
> who's got a 686 anyway (or PPro?) ?

-- 
Graham Tootell           
nikki AT gameboutique DOT com