Date: Tue, 08 Jul 1997 14:12:37 +0000 From: Bill Currie Subject: Re: inline asm ("g" or "a" for input... which is faster?) and memory.h To: Jeff Weeks Cc: djgpp AT delorie DOT com Reply-to: billc AT blackmagic DOT tait DOT co DOT nz Message-id: <33C24AD5.566E@blackmagic.tait.co.nz> Organization: Tait Electronics NZ MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit References: <33C11894 DOT A1A9BE AT execulink DOT com> Precedence: bulk Jeff Weeks wrote: > > __asm__ ("movw %%bx, (%%eax)" : : "a" (address), "b" (colour) ); > > However, would this be faster as: > > __asm__ ("movw %2, (%1)" : : "g" (address), "g" (colour) ); > > At first I thought the first would be faster, but then I thought 'what > if it has to push eax and ebx then pop them back again' Then it makes > sence that any available register (g) would be faster. Almost there. What you realy want is: __asm__("movw %w1,%0" : "=g"(address):"r"(colour)); This way, gcc also gets to decide how to access the variable ("m" might be better than "g" if it is always memory you are working with) and what register to use. gcc will produce something like "movw %ax,_color_table+50" if the address is known at compile time. Also, if you replace the "r" with an "i", colour can be a constant (assemble time) and an immediate mode instruction will be generated. BTW the 'w' specifies a short int register (2 bytes) is to be used, 'b' is for byte, and 'k' for longs (4 bytes). Unfortunatly, I don't think there is any way of accessing the high byte registers (%ah etc). Watch out with 'r': this allows the use of %esi, %edi and %ebp, use 'q' to limit gcc's choices to %e(a/b/c/d)x. > > This also brings up another point: Will GCC (DJGPP and Unix GCC) > optomize inline asm? No, but it will optimise any register setup/cleanup required for the inline assembly, which is why it is best to let gcc pick the register whenever possible. > > And lastly, does memcpy, memmove, memset and so on, alter memory dwords > at a time? Do they use rep movs?, rep stos? and so on? Or do they use > a technique I've seen before, where it's just a tight inner loop that > uses mov to copy a few dwords at a time then loops back (supposed to be > faster then rep ? on a 486+). Depends on the function and the paramteters. Sometimes they are function calls (the code in libc is well optimised) and athers, they are inlined and the implementation is dependent on the size of the transfer. Bill -- Leave others their otherness.