From: nikki AT gameboutique DOT co (nikki) Newsgroups: comp.os.msdos.djgpp Subject: Re: memcpy(); is there something faster? Date: 26 Feb 1997 09:54:26 GMT Organization: GameBoutique Ltd. Lines: 33 Message-ID: <5f118i$iqt@flex.uunet.pipex.com> References: <59g08k$758_001 AT cpe DOT Maroochydore DOT aone DOT net DOT au> <5euboi$296 AT flex DOT uunet DOT pipex DOT com> <5f05mt$s7s$1 AT doffen DOT uninett DOT no> NNTP-Posting-Host: www.gameboutique.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp > how can they cause an exception? When moving 8 bytes at a time (with > fild/fistp) you're using integers all the way. If you on the other hand use > fld/fstp (and can copy up to 10bytes at a time, but loose accuracy and get > exceptions and additional cache misses) I understand.. well exactly as you said really. using fild fistp is fine and you can move 16 bytes in 16 cycles with .5 normal write cache misses. unfortunately this is way too slow on anything other than a pentium, either above or below. now if you could replace those nasty NP instructions (fistp) with a nice fstp you'd save lots of cycles and the loop would now be very beneficial indeed. it occurred to me that if you know the address of the stack where fild has loaded these values, could you not take them off with 2 movl's instead and artificially pop the stack. ie to do the store part with integer instructions. this would give a gain of about 4 cycles per 16 bytes i think. 4/3 byte/cycle isn't bad after all. however i suspect rep stosd will be faster still. regards, nik > > >: 'safe' way which gives you a transfer rate of 16bytes/16cycles and 1/2 as >: many cache write misses. if it's 686 or higher you get 1/4 write cache misses. >: sadly, if it's 686 or higher the rep movsd will go faster ;( >: basically fpu memcopy is not all it's cracked up to be. > > who's got a 686 anyway (or PPro?) ? -- Graham Tootell nikki AT gameboutique DOT com