www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1995/08/15/15:10:51

Date: Tue, 15 Aug 95 12:48 MDT
From: mat AT ardi DOT com (Mat Hostetter)
To: "A.Appleyard" <A DOT APPLEYARD AT fs2 DOT mt DOT umist DOT ac DOT uk>
Cc: djgpp AT sun DOT soe DOT clarkson DOT edu
Subject: Re: inline asm?
References: <A852EB87734 AT fs2 DOT mt DOT umist DOT ac DOT uk>

>>>>> "A" == A Appleyard <A DOT APPLEYARD AT fs2 DOT mt DOT umist DOT ac DOT uk> writes:

    >> You must tell gcc about input and output values, and which
    >> registers get clobbered. ... This code hides from gcc the fact
    >> that __ax, etc. are really input and output parameters. gcc is
    >> free to assume that those variables do not get modified by this
    >> asm, and optimize accordingly.

    A>   I use these two macros as a pair thus, e.g.:- _ax=0x1234;
    A> etc; __SR(); asm("this and that"); __RR(); use_value_of(_ax);
    A> The original registers are saved by __SR() and restored by
    A> __RR(), and I do explicitly alter _ax in C text.

Unfortunately, that's not a legitimate way to write it (although I
believe you that it will work with the current gcc release).  Altering
_ax in C text isn't the issue; the problem is that the asm secretly
alters _ax, and also requires its value be legitimate on input.  This
way of writing it also does lots of pointless memory thrashing, too
(to save/restore *all* the registers).

Anyway, it's lame of me to criticize you without offering my own
example code.  If you can give me a small example of some __SR/__RR
asm of this form that you're using I can show you how I'd rewrite it.

    >> xchgl is really slow, at least on the Pentium.

    A>   How many times slower than a `movl'?

That's a fair question.  I found out xchgl was much slower when
writing some asm glue for the Linux Checker tool; the glue was short
but disappointingly slow.  Punting the two (?) xchgl's sped it up
*enormously*.  The xchgl's were something like 2/13 instructions but
took half the time (my memory is vague, so I could be off somewhat,
but I was timing this stuff with the Pentium's rdtsc instruction so
I'm certain they were Very Slow).  I'm confident xchgl is more than a
factor of two slower than movl (on the Pentium!), and of course it's
not pairable on the Pentium at all.  movl is UV (i.e. perfectly)
pairable.

-Mat

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019