Mail Archives: djgpp/1995/08/15/15:10:51
>>>>> "A" == A Appleyard <A DOT APPLEYARD AT fs2 DOT mt DOT umist DOT ac DOT uk> writes:
>> You must tell gcc about input and output values, and which
>> registers get clobbered. ... This code hides from gcc the fact
>> that __ax, etc. are really input and output parameters. gcc is
>> free to assume that those variables do not get modified by this
>> asm, and optimize accordingly.
A> I use these two macros as a pair thus, e.g.:- _ax=0x1234;
A> etc; __SR(); asm("this and that"); __RR(); use_value_of(_ax);
A> The original registers are saved by __SR() and restored by
A> __RR(), and I do explicitly alter _ax in C text.
Unfortunately, that's not a legitimate way to write it (although I
believe you that it will work with the current gcc release). Altering
_ax in C text isn't the issue; the problem is that the asm secretly
alters _ax, and also requires its value be legitimate on input. This
way of writing it also does lots of pointless memory thrashing, too
(to save/restore *all* the registers).
Anyway, it's lame of me to criticize you without offering my own
example code. If you can give me a small example of some __SR/__RR
asm of this form that you're using I can show you how I'd rewrite it.
>> xchgl is really slow, at least on the Pentium.
A> How many times slower than a `movl'?
That's a fair question. I found out xchgl was much slower when
writing some asm glue for the Linux Checker tool; the glue was short
but disappointingly slow. Punting the two (?) xchgl's sped it up
*enormously*. The xchgl's were something like 2/13 instructions but
took half the time (my memory is vague, so I could be off somewhat,
but I was timing this stuff with the Pentium's rdtsc instruction so
I'm certain they were Very Slow). I'm confident xchgl is more than a
factor of two slower than movl (on the Pentium!), and of course it's
not pairable on the Pentium at all. movl is UV (i.e. perfectly)
pairable.
-Mat
- Raw text -