Xref: news2.mv.net comp.os.msdos.djgpp:4955 From: brennan AT mack DOT rt66 DOT com (Brennan "Bas" Underwood) Newsgroups: comp.os.msdos.djgpp Subject: Re: Speed optimization: memcpy() or for loop ?? Date: 13 Jun 1996 14:02:09 -0600 Organization: None, eh? Lines: 40 Message-ID: <4pps41$dnp@mack.rt66.com> References: <4pmlrp$p7u AT crc-news DOT doc DOT ca> <4pmscu$nrt AT rs18 DOT hrz DOT th-darmstadt DOT de> NNTP-Posting-Host: mack.rt66.com To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp In article <4pmscu$nrt AT rs18 DOT hrz DOT th-darmstadt DOT de>, Alexander Lehmann wrote: >Richard Young (richard DOT young AT crc DOT doc DOT ca) wrote: >: A question for the optimization experts: >: For moving data, is it faster to use >: a) memcpy(x,y,n*sizeof(x[0])) >: or >: b) for (i = 0; i < n; i++) x[i] = y[i]; >: or are they basically the same speed. >: With C++ is it better code practice to use b) over a)? > >(a) uses the function dj_movedata, which will use the repeat >instruction to copy 4 byte values, which should be pretty fast. > >(b) requires a lots of address calculations, unless the compiler is >very smart (I don't think so), but it can be sped up a bit at least >(assuming that x and y are of type foo): It's smart enough to use i as an offset to x and y, but that'll cost at least 1 cycle/instruction on 486 and below. The fastest way to move dword aligned memory on 486 and DOWN, is rep movsl. On Pentium+, you can beat it under the right circumstances, but it's very difficult. I saw one trick for using many pushes/pops after setting up esp, but it still didn't see a major gain over rep movsl. rep movsl does one dword/cycle (cause it uses both pipes internally.) Very hard to beat. Check out http://www.rt66.com/~brennan/djgpp/bgtia.html for a couple rep movsl inline assembly macros, *iff* you are doing dword aligned stuff. You *can* get a major increase by doing a tight loop of loads/stores with the FPU since it can work with 8-byte long longs, but you'll be in for an interesting time if you happen to load any of the FPU error bit patterns! e.g. NotANumber, Divide by Zero, or something to that effect. Read comp.lang.asm.x86; some really good performance coders hang out there. --Brennan -- brennan AT rt66 DOT com | "He say you Brade Runna!"