From: vcarlos35 AT juno DOT com To: djgpp AT delorie DOT com Date: Wed, 30 Dec 1998 09:04:24 EST Subject: Re: pairable instructions much faster than the string operations on a Pentium and above ?! Message-ID: <19981230.090441.5903.0.vcarlos35@juno.com> References: <368A195D DOT F315167E AT gmx DOT de> X-Mailer: Juno 1.49 X-Juno-Line-Breaks: 0,2-3,5-38,40-42,44,46-49 Reply-To: djgpp AT delorie DOT com On Wed, 30 Dec 1998 13:15:25 +0100 Christian Hofrichter writes: >For along time I believed that string operations (rep stosl; rep >movsl) were the fastest methods to write to memory blocks untill I heard that >a Pentium can execute two instructions simultaneously. So I realized >that there are better methods to move memory blocks ! > >" rep stosl " : takes 3 clock cycles on a Pentium > > >asm("1:\n\t" > "movl (%%ebx),%%eax\n\t" /*pairable in U-pipe */ > "addl $4,%%ebx\n\t" /*pairbale in V-pipe */ > "decl %%ecx\n\t" /*pairable in U-pipe */ > "jnz 1b": /*pairbale in V-pipe */ > :"a"(55/*any value >*/),"c"((40*1024*1024)>>2),"b"(memory) > :"%ecx","%ebx"); >This takes only 2 clock cycles ! > > >To test that, I allocated a buffer of 40 Mb. First I used memset, it >took 690000 microseconds to fill the memory-block. >Then I wrote it in assembler ( just to be sure) with stosl and it took >the same time (how surprising ). >And then I wrote the code above and now it took only approximately >426000 microseconds to fill the memory-block !! >That is approximate the same ratio like 3 clock cycles to 2 clock >cycles. > >So how about a new optimation-switch in djgpp, called pairable >instructions ? After all it can often double the speed of the >program. I can also be used to improve graphic-performence, can't it ? > AFAIK, having a compiler automatically pair instructions (especially one such as gcc which runs on a wide variety of platforms) would pretty much be an impossible task. Instruction pairing rules are complicated and dependent on the CPU to a great extent. For example, on a 6th generation CPU, your code is not optimal because the increased register dependencies make it difficult for the out-of-order core to extract maximum parallelism from your code. Additionally, you have to worry about increased aggregate opcode size and mispredicted branches. Karl ___________________________________________________________________ You don't need to buy Internet access to use free Internet e-mail. Get completely free e-mail from Juno at http://www.juno.com/getjuno.html or call Juno at (800) 654-JUNO [654-5866]