Date: Tue, 08 Jul 1997 14:12:37 +0000
From: Bill Currie <billc AT blackmagic DOT tait DOT co DOT nz>
Subject: Re: inline asm ("g" or "a" for input... which is faster?) and memory.h
To: Jeff Weeks <pweeks AT execulink DOT com>
Cc: djgpp AT delorie DOT com
Reply-to: billc AT blackmagic DOT tait DOT co DOT nz
Message-id: <33C24AD5.566E@blackmagic.tait.co.nz>
Organization: Tait Electronics NZ
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7bit
References: <33C11894 DOT A1A9BE AT execulink DOT com>
Precedence: bulk

Jeff Weeks wrote:
> 
> __asm__ ("movw %%bx, (%%eax)" : : "a" (address), "b" (colour) );
> 
> However, would this be faster as:
> 
> __asm__ ("movw %2, (%1)" : : "g" (address), "g" (colour) );
> 
> At first I thought the first would be faster, but then I thought 'what
> if it has to push eax and ebx then pop them back again'  Then it makes
> sence that any available register (g) would be faster.

Almost there. What you realy want is:

__asm__("movw %w1,%0" : "=g"(address):"r"(colour));

This way, gcc also gets to decide how to access the variable ("m" might
be better than "g" if it is always memory you are working with) and what
register to use. gcc will produce something like "movw
%ax,_color_table+50" if the address is known at compile time.  Also, if
you replace the "r" with an "i", colour can be a constant (assemble
time) and an immediate mode instruction will be generated.

BTW the 'w' specifies a short int register (2 bytes) is to be used, 'b'
is for byte, and 'k' for longs (4 bytes).  Unfortunatly, I don't think
there is any way of accessing the high byte registers (%ah etc).  Watch
out with 'r': this allows the use of %esi, %edi and %ebp, use 'q' to
limit gcc's choices to %e(a/b/c/d)x.


> 
> This also brings up another point:  Will GCC (DJGPP and Unix GCC)
> optomize inline asm?

No, but it will optimise any register setup/cleanup required for the
inline assembly, which is why it is best to let gcc pick the register
whenever possible.

> 
> And lastly, does memcpy, memmove, memset and so on, alter memory dwords
> at a time?  Do they use rep movs?, rep stos? and so on?  Or do they use
> a technique I've seen before, where it's just a tight inner loop that
> uses mov to copy a few dwords at a time then loops back (supposed to be
> faster then rep ? on a 486+).

Depends on the function and the paramteters.  Sometimes they are
function calls (the code in libc is well optimised) and athers, they are
inlined and the implementation is dependent on the size of the transfer.

Bill
-- 
Leave others their otherness.